Sign in

Data Professional Experience

Ashburn, Virginia, United States
May 18, 2018

Contact this candidate





Machine Learning: Classification, Regression, Clustering, Deep Learning, Cross - Validation Statistical Methods: Hypothesis Testing, A/B Testing, Principal Component Analysis, ANOVA, GLM Programming Languages: Python, (scikit-learn, pandas, numpy, tensorflow), R, SQL, R Shiny, Excel, ggplot2 Professional Experience

Varian Medical Systems, Data Science Intern Palo Alto, CA May 2017 - Sept 2017

• Achieved an accuracy of 78% by predicting the volume of voids in ‘Targets’ using statistical modelling and Machine Learning techniques to prevent damage to the ion chamber.

• Improved accuracy of MLC leaves to 0.01 mm by using (image recognition) object detection techniques R- CNN, Fast R-CNN in Tensorflow (Python).

• Mined massive amounts of data (~500 GB) and performed large-scale data analysis to derive useful production/engineering insights into the product behavior and present it to a non-technical audience.

• Analyzed different machines quantitatively and qualitatively by plotting patterns (data visualization) of their usage using ggplot2 to understand their respective behavior. Northeastern University, Teaching Assistant (Statistics) Boston, MA Sept 2017 - Present

• Guided a class of 44 students during tutoring hours; graded assignments, quizzes in a timely manner.

• Explained several topics in Descriptive Statistics, Inferential Statistics and Probability (Probability Distributions, Hypothesis Testing, Multiple Linear Regression, ANOVA). Projects

Predicted Tips for NYC taxis (R) - link Mar 2017 - Apr 2017

• Computed a large set, 255 MB data set for analysis (1.3 million observations).

• Implemented machine learning and statistical models (RPART, Random forest, and regularized regression) to predict the tip percentage received from customers.

• Achieved 92% accuracy using Random forest after comparing the MSE & RMSE.

• Created interactive visualizations of tip rates using data visualization package ggplot2. Facial Recognition using PCA (Python) - link Jan 2017 - Feb 2017

• Optimized the number of components in PCA retaining a total variance of 94% in the data.

• Using F1 score as a measure achieved 92% success rate with SVM. Predicted risk category for each customer (R) - link Oct 2016 - Dec 2016

• Built a predictive model that classified risk, using SVM, Random Forest and logistic regression and feature engineering.

• Using feature extraction and Random Forest Model generated predictions with an accuracy of 68%. Built a Health Insurance Company Database - link Jan 2017 - Mar 2017

• Gathered and identified all the key entities and relations, to then design the database using Toad Data Modeler. (MS SQL Server)

• Built an interactive dashboard application using R shiny.

• Performed analysis and presented results using SQL. Education

Northeastern University, Boston, MA GPA 3.9 May 2018 Master of Science in Operations Research

Relevant Coursework: Machine learning, Data Mining, Deep Learning, Probability, Statistics, Data Management and Database design, Probabilistic Operations Research, Deterministic OR Udacity – Machine Learning Nanodegree In progress

Contact this candidate