Sign in

Data Scientist

Portland, Oregon, United States
January 04, 2018

Contact this candidate

Divya Vasireddy Mobile: (732).***.****


Over 4 years of experience as Big Data Developer with excellent Data Analysis and Machine Learning skills. Hands on experience in writing complex SQL queries to extract, transform and load(ETL) data from large datasets. Professional working experience of using programming languages and tools such as Python, Hive, SQOOP. Deep understanding of Software Development Life Cycle(SDLC) as well as Agile/Scrum methodology to accelerate Software development iteration.



M.S., Data Science (GPA: 3.72/4) Dec 2017

Focus areas: Statistical Analysis, Time Series Analysis & Forecasting, Statistical Learning/Machine Learning, Advanced Data Mining, Advanced Database Organizations, Monte Carlo Methods, Online Social Network Analysis


B.S., Computer Science (GPA: 3.66/4) May 2010



Data Science Intern May 2017- Aug 2017

Worked with a 3-person Data Science team to predict the number and amounts of claims for each patient service by applying Machine Learning classification techniques

Developed data preparation algorithm using Python and SQL to retrieve, aggregate, and vectorize data from 15 MySQL data warehouse tables (~250 GB data)

Leveraged out of core processing Stochastic Gradient Descent (SGD) classification technique in scikit-learn package in Python to handle large volume of data and iteratively aggregate the results

Developed visualizations to present results to business teams using Matplotlib and seaborn packages in Python.

Tools used: Pycharm, Github, Python, MySQL

ADP (Fortune 500 Cloud provider of HR, Benefits, Tax Solutions for 650,000 clients globally) Hyderabad, India

Data Engineer Aug 2014 - Aug 2015

Part of DataCloud Innovation team responsible for developing applications that enable analytics on top of ADP payroll data; served clients in Financial Services, Real Estate, and Insurance industries

Created python scripts to process JSON files and Data Frames in Spark to cleanse and prepare data in Hive tables

Developed custom build Pig scripts for implementing tools for the data science team.

Leveraged DevOps methodologies to package and migrate code from Dev to Test and Prod environments

Optimized Hive data transfer scripts for parallel processing and worked on building an efficient data pipeline for data delivery

ADP Hyderabad, India

Big Data Developer Jun 2012 - Aug 2014

Developed Sqoop scripts to perform daily data transfer of payroll data from Oracle database to HDFS

Developed Hive jobs to extract, transform, load (ETL) payroll data from Oracle database to HDFS

Advised project leader on key design decisions including implementation Dynamic partitioning and Bucketing for Data Processing in Hive

Developed ETL jobs using Scrum Agile methodology over the course of multiple product releases

ADP Hyderabad, India

Software Engineer Jun 2011 – Jun 2012

Developed Java applications for ADP portal product used by clients across 110 countries; primarily contributed to business workflow management, messaging, and Identity & Access Management (IAM) applications

Developed Java applications in various IDE tools including Eclipse, My Eclipse, Jenkins and Maven, for 3 consecutive Agile-based product releases

Developed complex SQL queries to retrieve data from Oracle and MySQL databases

Developed framework for automation testing using QTP and QC tools

Received ‘Certification of Appreciation’ and ‘You made a difference’ awards for two consecutive years at ADP for training and onboarding new hires across US and India teams


Github link for projects:

Twitter Data: Text Classification and graph Clustering (Course: Online Social Network Analysis - Python)

Objective: Build a classifier for sentiment analysis on Donald Trump Tweets and clustered the followers of Ellon Musk

Contribution: Collected 1000 tweets having search term ‘Donald Trump’. Built GLM and SVM classifiers after cleaning the tweets. I have collected the Ellon Musk followers and for each follower collected the 200 followers to identify the communities between people using Girvan Newman algorithm.

Impact: Calculated the Accuracy for each classifier developed above for sentiment analysis. Generated the graph by coloring the nodes for identifying the communities.

HR Resources Analytics (Course: Data Preparation and Analysis)

Objective: Identify the most valuable employees, predict churn among them, and make recommendations to retain those most valuable employees

Contribution: Used different feature selection techniques to select the best features and identified ‘Random Forest’ as the best model that fits the data in terms of Accuracy, Precision, Recall metrics and ROC curve Created Tableau visualizations to present the analysis and results

Impact: Identified top 33% most valuable employees among 15000 employees with 99.7% accuracy

Development of Guaranteed Automatic Integration Library (GAIL-open source) (Course: Monte Carlo methods – Matlab, Python)

Objective: GAIL is open source suite for integration problems in one and many dimensions, originally developed in Matlab.

Contribution: Developed a Monte Carlo method for estimating mean of a random variable based on Central Limit Theorem using Numpy and Scipy.

Impact: Gained hands-on expertise with Numpy and Scipy packages in Python and it is added value to academic community at Illinois Institute of Technology.


‘Machine Learning by Stanford University’ through Coursera (April 15th, 2017)

‘Neural Networks for Machine Learning by University of Toronto’ through Coursera (July 1st, 2017)

Software and Programming languages: Core Java, Python (Keras using Tensor Flow, Scikit-learn, Numpy, Scipy, Matplitlib), Tableau, Microsoft Excel, SQL, MATLAB, Putty, GitHub, SVN

Statistical Tools: R, ggplot2, dplyr, reshape

Contact this candidate