Sign in

Data Analysis, Data mining, R, Python, SQL, Tableau, Databricks, Spark

Boston, MA
February 29, 2020

Contact this candidate



Boston, MA 214-***-**** SUMMARY

Aspiring data scientist with passion for playing with data and finding valuable insights seeking opportunity to utilize analytical and statistical skills to impact on business decisions. EDUCATION

Northeastern University, Boston, MA [expected Mar 2021] M.P.S. in Analytics with concentration in Statistical Modelling Relevant Courses: Probability and Statistics, Enterprise Analytics, Data Mining Applications, Predictive Analytics, Communication & Visual Data Analysis, Data Management & Big Data University of Pune, Pune, India 2018

Bachelor of Computer Engineering


Programming Language: R, Java, Python (NumPy, Pandas, Matplotlib) Machine Learning: Linear and Logistic Regression, Clustering, Random Forest, Decision Trees, Neural Networks, Naïve Bayes

Databases: MySQL, MongoDB

Software and Tools: MS Excel, MS PowerPoint, Eclipse, Android Studio, RStudio, Microsoft Azure, Databricks, Spark

Visualization Tools: Tableau, RShiny


Feature Importance of US Accidents (Databricks, Spark R), Northeastern University Jan2020 - Feb 2020

• Created cluster of 3 million dataset in Databricks

• Visualized the occurrence of accidents based on states, hour, month and year

• Extracted feature importance as stops and bumps using Random Forest algorithm from SparkR library Visualization of Video Game Sales (Tableau Dashboard, RShiny), Northeastern University Oct 2019 - Nov 2019

• Created visual representation of sales analysis based on genre, top publishers, games, and platforms

• Identified gaming sales patterns around the globe NYC Airbnb Data Analysis (R, R-Studio), Northeastern University Sep 2019

• Analysed dataset by replacing null to 0 for reviews and converting dates into months

• Articulated linear regression by converting price variable into log to show data distribution

• Predicted model with maximum accuracy by comparing Decision Tree, Random Forest and Gradient Boost Tree using R-squared and Testing

Analysis on dataset of Apple App store (R, RStudio), Northeastern University Jun 2019

• Conceptualized dataset from Kaggle for analysis

• Forecasted user rating using Decision tree and Two-sample T-test

• Implemented linear regression and Decision Tree to demonstrate that user rating is dependent on all other variables such as price and size


“An Android Application for Driver Assistance and Event Alert System Using Ultrasonic Sensor and Heart Rate Sensor.” IEEE 2018 Fourth International Conference on Computing Communication Control and Automation

(ICCUBEA). Retrieved from-

Contact this candidate