Sign in

Data Scientist

Chicago, Illinois, United States
January 25, 2017

Contact this candidate




Data scientist with 2.5 years of experience in analyzing large datasets using Machine Learning, Natural Language Processing, and Deep Learning. Demonstrated ability to develop data pipelines in a professional setting using Hadoop and its ecosystem.


Northern Illinois University (NIU)

DeKalb, IL (August 2015 - May 2017)

Master of Science, Computer Science – GPA: 3.60/4.00 (till date)

Thesis: Text Analytics and Visualization using Network Techniques with application to Human Behavior

Acharya Nagarjuna University (ANU)

Guntur, India (June 2010 - May 2014)

Bachelor of Technology, Computer Science – Major GPA: 3.30/4.00


Research Assistant

Dept. of Computer Science at NIU

DeKalb, IL

August 2015 – Present

*Collaborated with professors from diverse fields and answered hard questions by building supervised and unsupervised machine learning models.

*Extracted data from databases, wrote scripts to parse, clean, combine, and process them.

*Created dashboards and visualizations of processed data, identified trends, anomalies.

*Investigated data problems, identified patterns, and published the results.

*Used predictive analytics and machine learning to create new products or drive decision making in a project oriented environment with aggressive deadlines.

*Derived inferences and conclusions, communicated results through reports, charts, or tables.

Data Analyst – Associate Engineer

Virtusa Software Services Pvt. Ltd.

Chennai, India

August 2014 -July 2015

*Extracted customer data from MySQL and Oracle databases.

*Optimized the data by performing data cleansing and data wrangling.

*Built data pipelines to enable data analysis at scale in real-time.

*Created, optimized, and scheduled efficient Map-Reduce, Pig, and Hive jobs.

*Completed unit testing using JUnit, Pig-Unit, and MR-Unit to ensure robustness.

*Followed standards and procedures for documentation.


Machine Learning: Classification, Regression, Clustering, and Feature Engineering.

Statistical Methods: Time series, regression models, hypothesis testing and confidence intervals, and dimensionality reduction using PCA and LDA

Big Data tools: Hadoop (Map-reduce, Hive, Pig, and Oozie).

Data Visualization: Tableau, seaborn, matplotlib, and Processing.

Software and Programming Languages: Python (pandas, numpy, scipy, networkx, beautiful soup, genism, nltk, scikit-learn, xgboost, keras, tensorflow), R, Java, C, C++, SQL, PL/SQL, Oracle. Linux, Microsoft Excel, AWS, Git, and SVN.

Relevant Coursework: Network analysis, Modelling and Simulation, Data Science, Data Mining, Probability and Statistics, Linear Algebra.

Certifications: Machine Learning in Python, Dimensionality Reduction in Python, Data Scientist Toolbox in R.


*Predicting students’ performance in MOOCs using Classification and Network *Analysis of forum data.

*Visual Analysis using Alternative layouts for text-based networks.

*Allstate Claims Severity (Kaggle competition)

*San Francisco Crime Classification (Kaggle competition)

*Text Analytics and Visualization using Network Techniques with application to Human Behavior (Thesis)

*Query based document retrieval using KL Divergence.

*Consumer Complaint Analysis using Hadoop: Map-Reduce.

Contact this candidate