Post Job Free

Resume

Sign in

Project Data

Location:
Bloomington, IN
Posted:
March 01, 2014

Contact this candidate

Resume:

Debpriya Seal

**** **** *** ******, *** G, Bloomington, IN 47408

Email: accwkr@r.postjobfree.com Cell: 209-***-****

GitHub: https://github.com/debseal Linkedin: www.linkedin.com/in/debpriyaseal/

SUMMARY

Certified Machine Learning developer from Stanford University.

Working knowledge of data mining techniques.

Expert with Java, R, Octave, Python (esp. SciPy, Numpy, NLTK), UNIX Shell Scripting, LaTeX

Understanding of Hadoop, HBase and Map/Reduce programming.

Working knowledge of SSIS, SSRS and SAS.

Certified Teradata SQL Specialist, Informatica ETL and DataStage ETL Developer.

EDUCATION

M.S. Computer Science (3.4/4.0) May 2014 Indiana University Bloomington, IN, United States

Relevant Courses: Data Mining, Social Media Mining, Probabilistic Graphical Models, Machine Learning, Cloud

Computing

B.E. Computer Science (3.8/4.0) 2007 Rajiv Gandhi Technical University, Bhopal, India

GRADUATE PROJECTS

Project: Predict Tag for Questionnaires Nov 2013 – present

Language: Python 2.7 Data Volume: Training data: (1 Billion) 7 GB, and Test data: 48 Million (2 GB)

Started with TF-IDF to come up with top 5 words and predicted them to be the Tags.

Working with k-Nearest Neighborhood algorithm to achieve the better accuracy.

Used Indiana University supercomputer Big Red II (having processing speed of one petaFLOPS) due to

sheer volume of records.

Project: Cluster Galaxies Sept 2013 – Oct 2013

Language: Java 5/R Data Volume: 761K non-R-AGN galaxies, 7K radio centroids

For this galaxies project the purpose of the research is to determine what causes an Active Galactic

Nucleus (AGN) in a given galaxy. For this particular project, these AGN are discovered by how bright they

are in Radio wavelengths, thus R-AGN. What we wanted to do was compare galaxies that have R-AGN to

those that do not.

Used k-means clustering algorithm to find galaxies with R-AGN.

Used Indiana University supercomputer Big Red II (having processing speed of one petaFLOPS) again due

to sheer volume of records.

Project: Hand-written digit recognition. May 2013 – Aug 2013

Language: Octave Data Volume: Train Data: 60K images (http://yann.lecun.com/exdb/mnist/)

Used the MNIST database of hand-written digits.

The digits had been size-normalized and centered in a fixed-size image (20x20 pixels).

For this I used 2 layer NN (Neural Network) with 1 hidden layer of 25 units.

Project: Detecting Fraudulent Transactions Oct 2013 – Nov 2013

Language: Java and R Data Volume: 400K

Data cleansing was a big challenge, as whether to drop incomplete record or keep them was not that

straightforward.

Used k-means clustering, Naïve Bayes and Iterative Dichotomiser3 algorithm to identity potential

fraudulent transactions. And then compared them.

D. Seal - Page 2

Project: Unidentified Flying Objects (UFO) Analysis Nov 2013 – Dec 2013

Language: Java, SQLite Data Volume: 62K

Analyzed the UFO citing’s (across world) to see if there is any pattern to that.

Analyzed to see whether they occur in any particular state /day.

Analysis to see whether citing are more recorded by male or female.

Project: Stock Market Prediction using twitter Jan 2013 – May 2013

Language: Python 2.7 Data Volume: 0.1 million tweets/week

Using twitter API, we pulled in tweets for specific companies tickers.

Using NLP (Natural Language Processing) prepare financial lexicon.

Build a classifier to classify a tweet as positive, negative or neutral.

PROFESSIONAL EXPERIENCE

Associate Instructor, Indiana University, Bloomington IN Aug 2013- present

Lead weekly labs for undergraduate students (of ~40 students) for I211-Information Infrastructure II

Guided students to learn Python.

Junior Analyst, MedSolutions, Franklin TN May 2013- Aug 2013

Used SSIS & SSRS tools to come up with a scoring system for every Post -Acute-Care facility.

Used SAS to performs analysis on claims data, to identify physicians with wrong referrals

Used SAS, to find Physicians who refer patients for scans which are not necessary or to really expensive

facility.

Analyst Programmer, Accenture Services Private Limited, India (Chennai) June 2007- June 2012

Over 5 years of programming and design experience on Data-warehousing under Retail & Healthcare Domain,

providing Business Intelligence (BI) solution for clients

Helped them to design an Enterprise/Unified Data Warehouse (EDW) or single version of truth with

Business Intelligence (BI) analytical capabilities for the client in the Informatics workspace.

Performed SQL performance tuning and query optimization.

Involved in ETL design, Development & Unit testing of code in ETL Tool based on the Technical Design.

Coordinated with the off-shore and on-shore colleagues and address any outstanding issues with respect

to functional or technical clarification.

Managed a team of 4 resources for both testing and development.

Understanding functional requirement and provide technical design inputs.

Build asset for the client to achieve certain business requirement using C++ in DataStage.

HONORS AND AWARDS

Won the Bonnaroo 2013, Hackathon at Nashville, TN.

ACE (Accenture Celebration Excellence) Award in 2009 & 2012.

ASE (Associate Software Engineer) Achiever’s Award in 2008.

EXTRACURRICULAR ACTIVITIES

Went till regionals to represent Indiana University, Bloomington for 9-Ball Pool 2013

For Accenture, conducted training for DataStage & Informatica ETL Tools across India, and maintained a

rating of 4 and above (out of 5).



Contact this candidate