Debpriya Seal
**** **** *** ******, *** G, Bloomington, IN 47408
Email: accwkr@r.postjobfree.com Cell: 209-***-****
GitHub: https://github.com/debseal Linkedin: www.linkedin.com/in/debpriyaseal/
SUMMARY
Certified Machine Learning developer from Stanford University.
Working knowledge of data mining techniques.
Expert with Java, R, Octave, Python (esp. SciPy, Numpy, NLTK), UNIX Shell Scripting, LaTeX
Understanding of Hadoop, HBase and Map/Reduce programming.
Working knowledge of SSIS, SSRS and SAS.
Certified Teradata SQL Specialist, Informatica ETL and DataStage ETL Developer.
EDUCATION
M.S. Computer Science (3.4/4.0) May 2014 Indiana University Bloomington, IN, United States
Relevant Courses: Data Mining, Social Media Mining, Probabilistic Graphical Models, Machine Learning, Cloud
Computing
B.E. Computer Science (3.8/4.0) 2007 Rajiv Gandhi Technical University, Bhopal, India
GRADUATE PROJECTS
Project: Predict Tag for Questionnaires Nov 2013 – present
Language: Python 2.7 Data Volume: Training data: (1 Billion) 7 GB, and Test data: 48 Million (2 GB)
Started with TF-IDF to come up with top 5 words and predicted them to be the Tags.
Working with k-Nearest Neighborhood algorithm to achieve the better accuracy.
Used Indiana University supercomputer Big Red II (having processing speed of one petaFLOPS) due to
sheer volume of records.
Project: Cluster Galaxies Sept 2013 – Oct 2013
Language: Java 5/R Data Volume: 761K non-R-AGN galaxies, 7K radio centroids
For this galaxies project the purpose of the research is to determine what causes an Active Galactic
Nucleus (AGN) in a given galaxy. For this particular project, these AGN are discovered by how bright they
are in Radio wavelengths, thus R-AGN. What we wanted to do was compare galaxies that have R-AGN to
those that do not.
Used k-means clustering algorithm to find galaxies with R-AGN.
Used Indiana University supercomputer Big Red II (having processing speed of one petaFLOPS) again due
to sheer volume of records.
Project: Hand-written digit recognition. May 2013 – Aug 2013
Language: Octave Data Volume: Train Data: 60K images (http://yann.lecun.com/exdb/mnist/)
Used the MNIST database of hand-written digits.
The digits had been size-normalized and centered in a fixed-size image (20x20 pixels).
For this I used 2 layer NN (Neural Network) with 1 hidden layer of 25 units.
Project: Detecting Fraudulent Transactions Oct 2013 – Nov 2013
Language: Java and R Data Volume: 400K
Data cleansing was a big challenge, as whether to drop incomplete record or keep them was not that
straightforward.
Used k-means clustering, Naïve Bayes and Iterative Dichotomiser3 algorithm to identity potential
fraudulent transactions. And then compared them.
D. Seal - Page 2
Project: Unidentified Flying Objects (UFO) Analysis Nov 2013 – Dec 2013
Language: Java, SQLite Data Volume: 62K
Analyzed the UFO citing’s (across world) to see if there is any pattern to that.
Analyzed to see whether they occur in any particular state /day.
Analysis to see whether citing are more recorded by male or female.
Project: Stock Market Prediction using twitter Jan 2013 – May 2013
Language: Python 2.7 Data Volume: 0.1 million tweets/week
Using twitter API, we pulled in tweets for specific companies tickers.
Using NLP (Natural Language Processing) prepare financial lexicon.
Build a classifier to classify a tweet as positive, negative or neutral.
PROFESSIONAL EXPERIENCE
Associate Instructor, Indiana University, Bloomington IN Aug 2013- present
Lead weekly labs for undergraduate students (of ~40 students) for I211-Information Infrastructure II
Guided students to learn Python.
Junior Analyst, MedSolutions, Franklin TN May 2013- Aug 2013
Used SSIS & SSRS tools to come up with a scoring system for every Post -Acute-Care facility.
Used SAS to performs analysis on claims data, to identify physicians with wrong referrals
Used SAS, to find Physicians who refer patients for scans which are not necessary or to really expensive
facility.
Analyst Programmer, Accenture Services Private Limited, India (Chennai) June 2007- June 2012
Over 5 years of programming and design experience on Data-warehousing under Retail & Healthcare Domain,
providing Business Intelligence (BI) solution for clients
Helped them to design an Enterprise/Unified Data Warehouse (EDW) or single version of truth with
Business Intelligence (BI) analytical capabilities for the client in the Informatics workspace.
Performed SQL performance tuning and query optimization.
Involved in ETL design, Development & Unit testing of code in ETL Tool based on the Technical Design.
Coordinated with the off-shore and on-shore colleagues and address any outstanding issues with respect
to functional or technical clarification.
Managed a team of 4 resources for both testing and development.
Understanding functional requirement and provide technical design inputs.
Build asset for the client to achieve certain business requirement using C++ in DataStage.
HONORS AND AWARDS
Won the Bonnaroo 2013, Hackathon at Nashville, TN.
ACE (Accenture Celebration Excellence) Award in 2009 & 2012.
ASE (Associate Software Engineer) Achiever’s Award in 2008.
EXTRACURRICULAR ACTIVITIES
Went till regionals to represent Indiana University, Bloomington for 9-Ball Pool 2013
For Accenture, conducted training for DataStage & Informatica ETL Tools across India, and maintained a
rating of 4 and above (out of 5).