Data Scientist/Analyst

Location:

Philadelphia, PA

Salary:

80000

Posted:

March 05, 2018

Contact this candidate

Resume:

Rahul Dhakecha ******@****.*****.***

**** ****** **., ***, ************, PA 19139 551-***-****

EDUCATION

University of Pennsylvania, Philadelphia May, 2018 MS - Data Science (Dept of Computer & Information Science); CGPA: 3.56/4.0 Spring 2018 courses : Big Data Analytics; Fall 2017 courses: Software Systems, Elements of Probability Theory Spring 2017 courses: Mathematical Statistics, Database and Information Systems, Convex Optimization Fall 2016 courses : Machine Learning, Modern Data Mining, Engineering Economics Sardar Vallabhbhai National Institute of Technology India, 2015 Bachelor of Technology in Electrical Engineering,CGPA: 8.41/10.00 SKILLS

Languages: Python(NumPy, SciPy), R, scikit-learn, C++, SQL, NoSQL, MATLAB, CVX, LATEX Databases: M ySQL, MongoDB;Web Languages: HTML, NodeJS Statistics: Exploratory Data Analysis, Hypothesis testing, parameter estimation, ANOVA, Parametric & Non-Parametric tests Data Technologies: Hadoop, Spark(PySpark), TensorFlow, MapReduce Miscellaneous : Amazon Web Services, Docker

WORK EXPERIENCE

Data Scientist Intern, Sprint, Kansas Summer 2017

● Worked on text analytics in Customer Experience department;developed SQL queries to fetch text from Teradata

● Cleaned, analyzed and processed data from various surveys; applied topic modelling using Latent Dirichlet Allocation in R

● Successfully developed Naive Bayes model for 2-level classification of text reviews in Python

● Successfully integrated new algorithm with existing rule based classification; updated Medallia keywords Research Assistant, Dept of Computer Science; University of Pennsylvania (MySQL, Python, Spark) Aug 2017-Present

● Multi-threaded scrap of Goodreads website to fetch book reviews, ratings, etc using Selenium, BeautifulSoup

● Built data pipeline, ETL and wrangled data, modelled Goodreads database & performed preliminary EDA

● Successfully framed and tested hypotheses; applied PCA to extract important groups of books preferred by users DATA SCIENCE/MACHINE LEARNING PROJECTS

Influence Maximization in Social Networks ( Independent Study with Prof Hamed Hassani;Python, Spark) Fall, 2017

● Successfully deployed BFS and Page Rank algorithm in distributed setting on social graph data of Stack Overflow

● Developed independent cascade model for modelling social network; tested network model on synthetic Kronecker graph

● Deployed Python code for modelling & learning temporal large scale network by NETINF algorithm, on memetracker data Music Recommendation System (team of 4, SQL, Python, NoSQL) March-April, 2017

● Developed relational and nonrelational database instances on AWS using MySQL & MongoDB from multiple datasets

● Developed NodeJS framework to integrate database with front end application, developed using HTML and AngularJS

● Built efficient SQL queries to recommend music to a user based on overlapping music of other users. Predicting readmission probability for diabetes inpatients ( R) October, 2016

● EDA and Cleaning of dataset; mitigated nonlinearity and heteroscedasticity by concave function transformations

● LASSO and Elastic Net used to determine important predictors; developed multiple linear regression model SOFTWARE DEVELOPMENT PROJECTS

Penn Cloud - distributed mailing system (C++, team of 4) September, 2017

● Developed a cloud system with scalable and fault-tolerant key-value store with efficient replication

● System supports mail services and storage service with features like uploading, downloading and sharing large files

● Robust communication achieved via gRPC between central controller, front end and storage nodes Distributed Chat System (C++) September, 2017

● Developed fully distributed, scalable client & server systems using UDP; multicast incorporated to deliver messages

● System supports various chat rooms along with unordered, FIFO and totally ordered multicast DATA SCIENCE CASE STUDIES

Billion Dollar Billy Beane : Developed simple linear regression model to predict average salary of baseball player; Performed model selection using Forward, Backward and all subset methods; using Cp, BIC criteria Fuel Efficiency in Automobiles: Created and bootstrapped multiple linear regression model; accounted for categorical variables Framingham Heart Study : Classified positive and negative patients for Heart Disease using logistic regression and Random Forests; results compared with Linear Discriminant Analysis and Naive Bayes

Contact this candidate