Sign in

Data Scientist

Chicago, Illinois, United States
January 25, 2017

Contact this candidate


+* (832)-***-****


Data scientist with 2.5 years of experience in analyzing large datasets using Machine Learning, Natural Language Processing, and Deep Learning. Demonstrated ability to develop data pipelines in a professional setting using Hadoop and its ecosystem.


Northern Illinois University (NIU)

DeKalb, IL (August 2015 - May 2017)

Master of Science, Computer Science – GPA: 3.60/4.00 (till date)

Thesis: Text Analytics and Visualization using Network Techniques with application to Human Behavior

Acharya Nagarjuna University (ANU)

Guntur, India (June 2010 - May 2014)

Bachelor of Technology, Computer Science – Major GPA: 3.30/4.00


Research Assistant

Dept. of Computer Science at NIU

DeKalb, IL

August 2015 – Present

*Collaborated with professors from diverse fields and answered hard questions by building supervised and unsupervised machine learning models.

*Extracted data from databases, wrote scripts to parse, clean, combine, and process them.

*Created dashboards and visualizations of processed data, identified trends, anomalies.

*Investigated data problems, identified patterns, and published the results.

*Used predictive analytics and machine learning to create new products or drive decision making in a project oriented environment with aggressive deadlines.

*Derived inferences and conclusions, communicated results through reports, charts, or tables.

Data Analyst – Associate Engineer

Virtusa Software Services Pvt. Ltd.

Chennai, India

August 2014 -July 2015

*Extracted customer data from MySQL and Oracle databases.

*Optimized the data by performing data cleansing and data wrangling.

*Built data pipelines to enable data analysis at scale in real-time.

*Created, optimized, and scheduled efficient Map-Reduce, Pig, and Hive jobs.

*Completed unit testing using JUnit, Pig-Unit, and MR-Unit to ensure robustness.

*Followed standards and procedures for documentation.


Machine Learning: Classification, Regression, Clustering, and Feature Engineering.

Statistical Methods: Time series, regression models, hypothesis testing and confidence intervals, and dimensionality reduction using PCA and LDA

Big Data tools: Hadoop (Map-reduce, Hive, Pig, and Oozie).

Data Visualization: Tableau, seaborn, matplotlib, and Processing.

Software and Programming Languages: Python (pandas, numpy, scipy, networkx, beautiful soup, genism, nltk, scikit-learn, xgboost, keras, tensorflow), R, Java, C, C++, SQL, PL/SQL, Oracle. Linux, Microsoft Excel, AWS, Git, and SVN.

Relevant Coursework: Network analysis, Modelling and Simulation, Data Science, Data Mining, Probability and Statistics, Linear Algebra.

Certifications: Machine Learning in Python, Dimensionality Reduction in Python, Data Scientist Toolbox in R.


*Predicting students’ performance in MOOCs using Classification and Network *Analysis of forum data.

*Visual Analysis using Alternative layouts for text-based networks.

*Allstate Claims Severity (Kaggle competition)

*San Francisco Crime Classification (Kaggle competition)

*Text Analytics and Visualization using Network Techniques with application to Human Behavior (Thesis)

*Query based document retrieval using KL Divergence.

*Consumer Complaint Analysis using Hadoop: Map-Reduce.

Contact this candidate