Nupur Gulalkari
** ******** **, *** ****, CA ****4 +1-617-***-**** ************@*****.***
https://github.com/nupur1492 Blog - h ttps://nupur1492.github.io/ h ttps://www.linkedin.com/in/nupurgulalkari/ Aspiring Data Scientist; Proficient in SQL, Python, R, Spark and Tableau. Over 12 months of professional experience in Machine Learning, Statistics and predictive analytics
PROFESSIONAL EXPERIENCE
Persistent Systems, Santa Clara, CA – Data Science Intern F eb 2017 - Current
● Extracted and analyzed metadata from source code management tool to find and group files that are change dependent on each other.
● Developed tool that provides recommendations and identifies code review experts based on these correlated files and their respective authors and reviewers to prevent integration bugs.
● The recommendation engine reduced development time by 20% and bug points by 15% .
● Used Java for data extraction and Apache Spark (R, Scala APIs) for data analysis and visualization. Quant University, Boston, MA – Data Science Intern S ept 2016 - Jan 2017
● Data wrangling and modeling on Loan data provided by the client to predict loan amounts that a customer is likely to take as well as loan eligibility based on parameters like monthly income, education, family, etc.
● Implemented and tested multiple algorithms like Regression, Decision Trees, Random Forests and clustering to find the best model for prediction.
● Slashed loan approval time and complexity by automating the process of loan amount prediction and acceptability
● Used Hive and Impala as data extraction tools and Spark, R and Tableau for analysis and visualizations.
● Conducted Apache Spark workshop for more than 100 professionals. Costco Wholesale, Seattle, WA – Data Science Intern June 2015 - Dec 2015
● Implemented M arket basket analysis using Apriori algorithm to identify obscure associations among different items that are purchased together and create recommendations based on the insights
● Implemented S entiment Analysis using Twitter API to understand the sentiment of Costco users and use the insights to further the sales of Costco products
● Used Hive and Impala to extract data and R and Shiny for analysis and web application to display recommendations. CDAC, India – Machine Learning Intern Aug 2013 - May 2014
● Improved handwriting recognition accuracy for Hindi language by 92% by implementing a self-learning handwriting recognition system.
● Studied and experimented with various supervised algorithms and implemented Dynamic Time Warping for Hindi language handwriting recognition using MATLAB.
EDUCATION
Northeastern University - MS in Information Systems, MA S ept 2014 - Aug 2016 Focus Area: Data Science and Machine Learning, Data Warehousing, Business Intelligence 3.3 University of Pune, India – Bachelor of Engineering, Computer Engineering A ug 2010 - Jun 2014 Focus Area: A dvanced DBMS and Data Mining, Data Structures and Algorithms F irst Class TECHNICAL SKILLS
Programming Languages R, Python, Scala, Java, C/C++ Databases SQL, Hive Query Language, Impala, MySQL, PostgreSQL, Machine Learning Skills Classification and Regression, Association Rule mining, Clustering, Neural networks Time series modelling, SVM, Decision Trees, Ensemble Methods Big Data Tools Apache Spark, Tableau, QlikView, Power BI, Microsoft Azure, Machine Learning Studio HOBBY, OPEN SOURCE & CLASS PROJECTS
Fraud Analysis using Enron data - P ython (Naive Bayes, Decision Trees, SVM, K-means)
● Analyzed Enron dataset to predict persons of interest in the fraud based on factors like email chains, salaries and bonuses.
● Implemented multiple Machine Learning Algorithms; KNN, Decision Trees, Support Vector Machines, Naive Bayes’, Regression and K-means clustering using the scikit learn library in Python.
● Used PCA for dimension reduction and outlier detection. NYC Subway Ridership Analysis - P ython (Linear and OLS regression)
● Analyzed number of subway riders based on time of the day, day of week, weather conditions like rain, wind and temperature, etc.
● Analysis using Linear and OLS regression, implemented in python using sklearn, pandas, scipy and numpy. Handwriting Character Recognition - R (Logistic Regression, SVM)
● The aim of this project is to study handwriting character recognition using MNIST data. (Kaggle competition)
● Identified handwritten digits using Logistic Regression and Random forests with an accuracy of 75%. Exploratory Data analysis using Facebook Data - R
● Analyzed a Facebook dataset using R, while studying Exploratory Data Analysis on Udacity, a course designed by Facebook.
● Worked with R packages ggplot2, dplyr, reshape2, tidyr, plotting histograms, scatterplots with multiple variables and how these can be used to perform a basic exploratory analysis on a dataset.