Pradipto Das
Research Engineer / PhD Student, SUNY Buffalo, USA Address: 16 Flickinger Court, Apt. B, Amherst, NY 14228
Email: *****@*******.***, Web: www.buffalo.edu/~pdas3 Phone: 716-***-****
Core Proficiencies: Well-rounded technical knowledge on applying machine learning and natural language processing techniques
on artificial intelligence problems involving multimodal data
Professional Summary
Data Mining / Text Mining Expertise:
Over 5 years hands-on experience in exploratory data analysis with a focus on unsupervised graphical models for problems
involving topical analysis of data consisting of different modalities such as text and video
Video to text and text to video translation without using expensive manual frame-by-frame annotation; video event detection
using supervised classification techniques such as support vector machines, logistic regression etc.
Developing probabilistic browsing models from scratch for un-structured and semi-structured documents
Developing state-of-the-art multi-document summarization systems
Exposure to cluster computing through MPI and Hadoop s Map Reduce
Career Chronology and Accomplishments
I. Research Engineer, CSE Department, SUNY Buffalo, Buffalo, NY, USA (Spring 2011 current)
A. Project: Natural Language Based Multimedia Event Detection/Recounting (MED/MER)
Successfully completed a large project on translating videos to keywords and back without using expensive video annotation efforts
Accomplishments:
Eliminated the need for expensive frame by frame manual text annotations to describe the major contents of a video
Enhanced video clustering and search through natural language descriptions
System ranked first in TRECVID 2012 Multimedia Event Recounting track for matching videos on a given abstract event to
specific event descriptions based purely on predicted text
Joint research in collaboration with Honeywell ACS Labs, MN, Kitware Inc., NY, Stanford University, Simon Fraser University
and Georgia Tech University [Project funded by IARPA s ALADDIN program]
B. Project: Exploratory Data Analysis and Multi-document Summarization using Topic Models
Successfully formulated and implemented from scratch bi-perspective topic models that allow modeling of ubiquitous document
representations documents that incorporate both word level annotation classes and document level tags
Implemented a summarization system using bi-perspective topic models and document centric linguistic features that can
summarize multiple documents into one short bulleted list summary
Accomplishments:
Raised system performance to be at par with the state-of-the-art newswire summarization systems as per evaluations
based on Guided Summarization datasets from Text Analysis Conference
C. Project: Mining On-line E-learning Discussion Forums for Non-topical and Topical Analysis
Devised an algorithmic solution to identify text book concepts in e-learning discussion forum posts and thereby mapping the forum
posts to the table of contents in textbooks
Accomplishments:
Successfully applied topic models to discriminate between on-textbook versus off-textbook contents of the discussion
forums using domain knowledge from e-textbooks
Gained experience on Hadoop and Hive by writing simple data pre-processing methods
[Project funded by Apollo Group Inc., University of Phoenix Distance Learning School]
II. Research Intern, Janya Inc., Amherst, NY, USA (Summer 2010)
A. Project: Gibbs sampling based Topic Modeling Framework for the Semantex Text Analytics Processor Pipeline
Improved product capabilities by including corpus based solutions in addition to document/sentence centric models
IV. Visiting Research Fellow, Center for Soft Computing Research, ISI, Kolkata, India (Aug 2005 Jul 2006)
A. Project: DIET: Directional Entropy based Corner Detection in Gray-scale Images [Spring 2006]
Implemented different entropy measures on gradients surrounding edges in an image to detect corners
Accomplishments:
Proposed method performed at-par with the best geometric corner detection methods in terms of performance but with
lower compute time
V. Assistant Systems Engineer, Tata Consultancy Services Ltd. (TCS), Kolkata, India (Aug 2004 Jul 2005)
A. Project: TCS Kolkata Intranet Portal: Connect Kolkata
Development from scratch using J2EE architecture project was implemented using Jakarta Struts 1.1 framework, JSP
and Oracle 9i as Relational Database
VI. Project Intern, Machine Intelligence Unit, Indian Statistical Institute (ISI), Kolkata, India (Spring 2004)
A. Project: Statistical Outlier Detection in Large Multivariate Datasets
Utilized Tukey s Bi-weight estimator, robust Mahalanobis distances and Non-parametric Parzen window based
unsupervised density estimation to find outliers lying in the tail of the distance-from-median distribution of the data
Education
University at Buffalo, State University of New York, (SUNY Buffalo) USA
PhD, Computer Science Aug 2006 to present (Expected: Spring, 2013) [GPA 3.678/4.0]
West Bengal University of Technology, Kolkata, India
MCA (Master of Computer Applications) July 2004 [GPA 8.84/10.0]
Jadavpur University, Kolkata, India
BS (Honors) Mathematics July 2001 [First Division]
Publications
[6] P. Das, R. K. Srihari and J. J. Corso, Translating Related Words to Videos and Back through Latent Topics, Proceedings of the
Sixth International Conference on Web Search and Data Mining, WSDM, Rome, Italy, February 4-8, 2013 [oral presentation]
[5] P. Das and R. K. Srihari, Using Tag-Topic Models and Rhetorical Structure Trees to Generate Bulleted List Summaries
[submitted] [shorter version selected for oral presentation and appears in Proceedings of NIST Text Analysis Conference, Nov
2011, Gaithersburg, MD www.nist.gov/tac/publications/2011/participant.papers/UBSummarizer.proceedings.pdf]
[4] P. Das, R. K. Srihari and Y. Fu, Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives, in
Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM), Nov 2011, Glasgow, Scotland
[oral presentation]
[3] P. Das and R. K. Srihari, Learning To Summarize using Coherence, in Proceedings of NIPS Workshop on Applications for
Topic Models: Text and Beyond, Dec 2009, Whistler, Canada [poster presentation]
[2] P. Das and R. K. Srihari, Utterance Topic Models for Generating Coherent Summaries, in Proceedings of NIST Text Analysis
Conference, Nov 2009, Gaithersburg, MD [oral presentation]
[1] P. Das, R. K. Srihari and S. Mukund, Discovering Voter Preferences in Blogs using Mixtures of Topic Models, in Proceedings of
the Third Workshop on Analytics for Noisy Unstructured Text Data, Jul 2009, Barcelona, Spain [oral presentation]
Computer Skills
Programming Languages: Java, C++, MATLAB
Distributed and Large Data Processing Experience: Hadoop, Hive, MPI, Lucene
Scripting Experience: Shell scripts, Perl
Open source software: Stanford CoreNLP Suite, personal codes from www.buffalo.edu/~pdas3/software/software.html
Teaching Experience and Relevant Coursework
Teaching Assistant, CSE Dept., SUNY Buffalo, Buffalo, NY, USA (Fall 2006 Fall 2010)
Relevant Courses as a Teaching Assistant
Information Retrieval (CSE535) [Fall 2009/10] Machine Learning (CSE576) [Fall 2008]
Relevant Course Projects completed during Coursework
Parallel Latent Dirichlet Allocation (in C using MPI) Instructors: Vipin Chaudhary and Matthew Jones [Fall 2008]
Search engine architecture from scratch (in C++) Instructor: Rohini K. Srihari [Spring 2008]
Peer to Peer file sharing application (in Java) Instructor: Murat Demirbas [Fall 2007]
Neural networks for spam classification (in Matlab) Instructor: Matthew Beal [Fall 2006]
Awards and Honors
Research/Teaching Assistantship for PhD studies at SUNY Buffalo from Sep 2006 to present
Fellowship for the post of Visiting Research Fellow at Indian Statistical Institute, Kolkata, India Aug 2005 to Jul 2006
Certificate of merit and memento for standing First Class 2nd in MCA, Kolkata, India Jan 2005
Top Performer Award for batch (T-47) in the Initial Learning Program at TCS, Trivandrum, India Nov 2004
Govt. of India National Scholarship based on BS results at Jadavpur University, Kolkata, India Aug 2003
References: Available on request
2