Divya Vasireddy
Email:**********@****.***.*** Mobile: 732-***-**** https://linkedin.com/in/divyavasireddy/
EDUCATION
ILLINOIS INSTITUTE of TECHNOLOGY Chicago, IL
M.S., Data Science (GPA: 3.85/4) Dec 2017 (Expected)
Focus areas: Statistical Analysis, Time Series Analysis & Forecasting, Statistical Learning/Machine Learning, Advanced Data Mining, Advanced Database Organizations, Monte Carlo Methods, Online Social Network Analysis
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY Hyderabad, India
B.S., Computer Science (GPA: 3.66/4) May 2010
PROFESSIONAL EXPERIENCE
DISCOVERY HEALTH PARTNERS Chicago, IL
Data Science Intern May 2017- Aug 2017
Worked with 3-person Data Science team to predict the number and amounts of claims for each patient service by applying Machine Learning classification techniques
Developed data preparation algorithm using Python and SQL to retrieve, aggregate, and vectorize data from 15 MySQL data warehouse tables (~250 GB data)
Leveraged out of core processing Stochastic Gradient Descent (SGD) classification technique in scikit-learn package in Python to handle large volume of data and iteratively aggregate the results
Developed visualizations to present results to business teams using Matplotlib and seaborn packages in Python.
Tools used: Pycharm, Github, Python, MySQL
ADP (Fortune 500 Cloud provider of HR, Benefits, Tax Solutions for 650,000 clients globally) Hyderabad, India
Data Engineer Aug 2014 - Aug 2015
Part of DataCloud Innovation team responsible for developing applications that enable analytics on top of ADP payroll data; served clients in Financial Services, Real Estate, and Insurance industries
Created python scripts to process JSON files and Data Frames in Spark to cleanse and prepare data in Hive tables
Developed custom build Pig scripts for implementing tools for the data science team.
Leveraged DevOps methodologies to package and migrate code from Dev to Test and Prod environments
Optimized Hive data transfer scripts for parallel processing and worked on building an efficient data pipeline for data delivery
ADP Hyderabad, India
Big Data Developer Jun 2012 - Aug 2014
Developed Sqoop scripts to perform daily import and export of payroll data from Oracle database to HDFS
Developed Hive jobs to transform payroll data from Oracle database to HDFS
Advised project leader on key design decisions including implementation Dynamic partitioning and Bucketing for Data Processing in Hive
Developed ETL jobs over the course of multiple product releases
ADP Hyderabad, India
Software Engineer Jun 2011 – Jun 2012
Developed Java applications for ADP portal product used by clients across 110 countries; primarily contributed to business workflow management, messaging, and Identity & Access Management (IAM) applications
Developed Java applications in various IDE tools including Eclipse, My Eclipse, and Maven, for 3 consecutive Agile-based product releases
Developed complex SQL queries to retrieve data from Oracle and MySQL databases
Received ‘Certification of Appreciation’ and ‘You made a difference’ awards for two consecutive years at ADP for training and onboarding new hires across US and India teams
ACADEMIC PROJECTS
Github link for projects: https://github.com/Vasireddydivya
Santander Bank Product Recommendation (Course: Advanced Data Mining)
Objective: Build a better recommendation system for targeted advertising to customers
Contribution: Imputed missing values by finding the distribution/frequency of each feature using Python Blended train and test data sets based on customer id and added XGBoost predictive algorithm to predict product trends
Impact: Recommendation model showed a 0.3+ MAP@7 score (score predicts top 7 products customers will choose from)
HR Resources Analytics (Course: Data Preparation and Analysis)
Objective: Identify the most valuable employees, predict churn among them, and make recommendations to retain those most valuable employees
Contribution: Used different feature selection techniques to select the best features and identified ‘Random Forest’ as the best model that fits the data in terms of Accuracy, Precision, Recall metrics and ROC curve Created Tableau visualizations to present the analysis and results
Impact: Identified top 33% most valuable employees among 15000 employees with 99.7% accuracy
Performed Clustering on 20 newsgroup and Yelp Datasets (Course: Statistical Learning)
Objective: Perform K-means, LDA, and LSA clustering on 20 news groups and Yelp data sets (JSON files)
Contribution: Identified best set of clusters by using ‘nbclust’ function in R. Implemented the LSA and LDA using singular value decomposition for each data set; compared the LSA and LDA performance based on Accuracy metric
Impact: Identified 3 major groups for each of the data sets with 88% accuracy
SKILLS AND CERTIFICATIONS
‘Machine Learning by Stanford University’ through Coursera (April 15th, 2017)
‘Neural Networks for Machine Learning by University of Toronto’ through Coursera (July 1st, 2017)
Software and Programming languages: Core Java, Python (Keras using Tensor Flow, Scikit-learn, Numpy, Scipy, Matplitlib), Tableau, Microsoft Excel, SQL, MATLAB, Putty, GitHub, SVN
Statistical Tools: R, ggplot2, dplyr, reshape