Divya Vasireddy
Email:**********@****.***.*** Mobile: 732-***-**** https://linkedin.com/in/divyavasireddy/
SUMMARY
Over 4 years of experience as Big Data Developer with excellent Data Analysis and Machine Learning skills. Hands on experience in writing complex SQL queries to extract, transform and load(ETL) data from large datasets. Professional working experience of using programming languages and tools such as Python, Hive, SQOOP. Deep understanding of Software Development Life Cycle(SDLC) as well as Agile/Scrum methodology to accelerate Software development iteration.
EDUCATION
ILLINOIS INSTITUTE of TECHNOLOGY Chicago, IL
M.S., Data Science (GPA: 3.72/4) Dec 2017
Focus areas: Statistical Analysis, Time Series Analysis & Forecasting, Statistical Learning/Machine Learning, Advanced Data Mining, Advanced Database Organizations, Monte Carlo Methods, Online Social Network Analysis
JAWAHARLAL NEHRU TECHNOLOGY INSTITUTE Hyderabad, India
B.S., Computer Science (GPA: 3.66/4) May 2010
PROFESSIONAL EXPERIENCE
DISCOVERY HEALTH PARTNERS Chicago, IL
Data Science Intern May 2017- Aug 2017
Worked with a 3-person Data Science team to predict the number and amounts of claims for each patient service by applying Machine Learning classification techniques
Developed data preparation algorithm using Python and SQL to retrieve, aggregate, and vectorize data from 15 MySQL data warehouse tables (~250 GB data)
Leveraged out of core processing Stochastic Gradient Descent (SGD) classification technique in scikit-learn package in Python to handle large volume of data and iteratively aggregate the results
Developed visualizations to present results to business teams using Matplotlib and seaborn packages in Python.
Tools used: Pycharm, Github, Python, MySQL
ADP (Fortune 500 Cloud provider of HR, Benefits, Tax Solutions for 650,000 clients globally) Hyderabad, India
Data Engineer Aug 2014 - Aug 2015
Part of DataCloud Innovation team responsible for developing applications that enable analytics on top of ADP payroll data; served clients in Financial Services, Real Estate, and Insurance industries
Created python scripts to process JSON files and Data Frames in Spark to cleanse and prepare data in Hive tables
Developed custom build Pig scripts for implementing tools for the data science team.
Leveraged DevOps methodologies to package and migrate code from Dev to Test and Prod environments
Optimized Hive data transfer scripts for parallel processing and worked on building an efficient data pipeline for data delivery
ADP Hyderabad, India
Big Data Developer Jun 2012 - Aug 2014
Developed Sqoop scripts to perform daily data transfer of payroll data from Oracle database to HDFS
Developed Hive jobs to extract, transform, load (ETL) payroll data from Oracle database to HDFS
Advised project leader on key design decisions including implementation Dynamic partitioning and Bucketing for Data Processing in Hive
Developed ETL jobs using Scrum Agile methodology over the course of multiple product releases
ADP Hyderabad, India
Software Engineer Jun 2011 – Jun 2012
Developed Java applications for ADP portal product used by clients across 110 countries; primarily contributed to business workflow management, messaging, and Identity & Access Management (IAM) applications
Developed Java applications in various IDE tools including Eclipse, My Eclipse, Jenkins and Maven, for 3 consecutive Agile-based product releases
Developed complex SQL queries to retrieve data from Oracle and MySQL databases
Developed framework for automation testing using QTP and QC tools
Received ‘Certification of Appreciation’ and ‘You made a difference’ awards for two consecutive years at ADP for training and onboarding new hires across US and India teams
ACADEMIC PROJECTS
Github link for projects: https://github.com/Vasireddydivya
Twitter Data: Text Classification and graph Clustering (Course: Online Social Network Analysis - Python)
Objective: Build a classifier for sentiment analysis on Donald Trump Tweets and clustered the followers of Ellon Musk
Contribution: Collected 1000 tweets having search term ‘Donald Trump’. Built GLM and SVM classifiers after cleaning the tweets. I have collected the Ellon Musk followers and for each follower collected the 200 followers to identify the communities between people using Girvan Newman algorithm.
Impact: Calculated the Accuracy for each classifier developed above for sentiment analysis. Generated the graph by coloring the nodes for identifying the communities.
HR Resources Analytics (Course: Data Preparation and Analysis)
Objective: Identify the most valuable employees, predict churn among them, and make recommendations to retain those most valuable employees
Contribution: Used different feature selection techniques to select the best features and identified ‘Random Forest’ as the best model that fits the data in terms of Accuracy, Precision, Recall metrics and ROC curve Created Tableau visualizations to present the analysis and results
Impact: Identified top 33% most valuable employees among 15000 employees with 99.7% accuracy
Development of Guaranteed Automatic Integration Library (GAIL-open source) (Course: Monte Carlo methods – Matlab, Python)
Objective: GAIL is open source suite for integration problems in one and many dimensions, originally developed in Matlab.
Contribution: Developed a Monte Carlo method for estimating mean of a random variable based on Central Limit Theorem using Numpy and Scipy.
Impact: Gained hands-on expertise with Numpy and Scipy packages in Python and it is added value to academic community at Illinois Institute of Technology.
SKILLS AND CERTIFICATIONS
‘Machine Learning by Stanford University’ through Coursera (April 15th, 2017)
‘Neural Networks for Machine Learning by University of Toronto’ through Coursera (July 1st, 2017)
Software and Programming languages: Core Java, Python (Keras using Tensor Flow, Scikit-learn, Numpy, Scipy, Matplitlib), Tableau, Microsoft Excel, SQL, MATLAB, Putty, GitHub, SVN
Statistical Tools: R, ggplot2, dplyr, reshape