SAYANTAN DASGUPTA
email id:
********@***.***
UC Irvine, Campus
********.********@*****.*** CA - 92617
Mobile no: 949-***-****
EDUCATION
University of California Irvine 2011-2013
Master in Computer Science
Coursework: 'Fundamental Algorithm', 'Computer Architecture', 'Information
Retrieval', 'Visual Computing', 'Machine Learning', 'Probabilistic
Learning', 'Data Management System', 'Bio-Informatics', 'Bayesian
Statistics'
Indian Institute of Technology, Kharagpur 2003-08
Bachelor & Master in Electrical Engineering
PROGRAMMING SKILLS
C, C++. Java, HTML, PHP/ Javascript, JSON, PL/SQL, Lucene, SOLR, Matlab,
R, Python, mlpy, numpy, scipy, NLTK, Amazon EC2, OpenMP, Hadoop, Greenplum
MPP, MADLIB, GraphLab, MySQL, Postgresql, Unix, MacOSX
DATA SCIENCE INTERN at Greenplum Inc
Neural Network for Sparse Data in MPP Database June - September 2012
. Implemented Back-Propagation with incremental training for multilayer
Neural Network in Greenplum MADLIB library (C++) on Greenplum MPP Platform
clusters (96 nodes, 6 core Intel Xeon 3.3GHz on each node, 50GB shared
RAM)
. Used selective weight update for back-propagation to reduce the
complexity of sparse data training.
PROFESSIONAL EXPERIENCE
Quantitative Analyst, Credit Suisse Business Analytics India 2010-2010
. Part of the Quantitative Risk Management Team
. Implemented VaR & Statistical Delta Risk Hedging model for Commodity
Futures and Options based on Black-Scholes-Merton pricing model
. Used future price, interest rate, option price etc. of each trading day
for an interval of 5 years to validate the models
Design Engineer (Texas Instruments India) 2008-2010
. Part of a challenging team for imaging software development of high
performance multimedia application device, OMAP4TM for smart phones.
. Designed and implemented the software framework, wrote manuals and
achieved challenging performance targets for the advanced imaging
algorithms & validated them on various platforms and hardware boards.
. Implemented multimedia frameworks like OpenMax for video codecs like
H264, VP6 decoders.
MS THESIS WORKS
Graphical Model for Recommendation System
. Created a Graphical Model for Collaborative Filtering, for explicit
feedback recommendation, e.g., where there is feedback available in a
scale 1-5 or 1-10
. Defined a Pairwise Markov network with each user representing each node,
and the pairwise CPD representing the count of the different rating common
to them, and implemented a parallel version of loopy BP for inference
. Implemented the code in C++ with parallel inference using OpenMP, and
executed on Amazon EC2 machine (32 virtual CPU with 60GB RAM)
ACADEMIC PROJECT WORKS
Design of a Credit Scoring System based on Random Forest Tree (Kaggle
Leaderboard)
. Designed a credit scoring system based on random forest tree to classify
the borrowers who are more probable to default, based on their income,
debt ratio and credit history
. A random forest based classifier was implemented to for the
classification, and it gave around 94% accuracy upon cross validation, and
a 86% ROC rate
. Our team UCI_combination came 6th out of 925 teams in the Kaggle "Give me
a CREDIT" competition.
Design of a Distributed Database Management System (CiteSeer Link)
. Designed and implemented a homogenous Distributed Database, using MySQL
servers as individual localized database servers, and a Java based
software acting as a middle layer.
. The data stored in MySQL servers located at different location could be
fetched by a single query through the DDBMS system
. The middle layer took queries input by users, parsed the query, sent the
fragmented queries to individual servers, retrieved the results from them,
joined them and produces the final result to the user. The query
processing remained transparent to the user, as if the all of data was
located in a single server.
Parsing, Alignment & Modeling of Genome Sequence (codebase)
. Parsed Genome Expressions to detect the genes present in a genome in
FASTA file format using regular expression libraries of Python
. Globally aligned two genes based on Dynamic Programing (Needleman-Wunsch)
Algorithm.
. Implemented Viterbi Algorithm & Posterior technique similar to Entity
Resolution for detecting the Type & Origin of the Genes. The Viterbi
Algorithm & Posterior Decoding was implemented in Python.
Automated Rating Prediction of Yelp Review (slides)
. Built an ordinal regression model for the predicting the rating in a
scale of 1 to 5 of yelp review from the review text
. The ordinal regression is done through SVM. We fit one SVM hyper-plane in
between two successive ratings, and a total of 4 SVM's to separate all 5
ratings
. The entire Yelp dataset contained around 230000 ratings, and we used 90%
data for training and 10% for validation, and we performed a 10-fold cross
validation. The worst-case prediction accuracy is 86%, along-with an RMSE
of 0.42.
Design of Search Engine using Lucene
. Designed a Lucene based space-optimized inverted index for our department
website (ics.uci.edu)
. Implemented positional indexing to enable search query with phrase
consisting any number of terms
. Used Anchor Text Mining to improve the NDCG from 0.4 to 0.67
. Developed a GUI for search query input based in JSP
EXTRA CURRICULAR ACTIVITIES
. Participated in raising funds for admitting children from slum areas to
schools, as a member of Texas Instruments India Foundation (TIIF)
. Participated in teaching children in Bangalore outskirt slum areas, as a
Volunteer of Teach-India Program by Times of India