Post Job Free
Sign in

M.S Data Analytics Engineering Student at Northeastern University

Location:
Boston, MA
Posted:
October 11, 2020

Contact this candidate

Resume:

May ****

GPA: *.*/**

Jan **** – April ****

Sept 2019 – Dec 2019

Sept 2019 – Dec 2019

Expected May 2021

GPA: 4/4

Jan 2020 – April 2020

May 2020 – August 2020

MITUL SHAH

**** ******** **., ***ton, MA 02215 857-***-**** ****.****@************.*** www.linkedin.com/in/mitul-shah- 5b9285160/

EDUCATION

Northeastern University, Boston, MA

Master of Science, Data Analytics Engineering

Relevant Coursework: Data Mining, Database Management, Data Visualization, Probability and Statistics Pune University, Pune, India

Bachelor of Engineering, Mechanical Engineering

Relevant Coursework: Optimization Techniques, Operational Research, Mathematics SKILLS

Languages: R, Python, SQL

Libraries: NumPy, Pandas, Scikit-Learn, ggplot2, dplyr, stringr, lubridate, pySpark, keras, scrapy Tools: Tableau, Jupyter, Spyder, MS Excel, RStudio, Minitab ETL: MySQL, PostgreSQL, Apache Spark, Neo4j, MongoDB ACADEMIC PROJECTS

Sparkify Data Modelling with Cassandra (Apache Cassandra, Python, Data, ETL)

• Created an Apache Cassandra database to know when and how users are playing songs on streaming app

• Implemented ETL pipeline to pre-process data using pandas and modelled tables according to queries Tesla Giga Factory Supply Chain Database Design (MySQL, EER Model, MongoDB)

• Designed conceptual EER, UML diagram to optimize the supply chain network of the manufacturing plant

• Implemented the conceptual model in MySQL with the set constraints and imported data from Excel

• Created stored procedures,triggers and queried the MySQL database for further analysis

• Build and queried a MongoDB to understand and solve the optimization problem using NoSQL DB Reducing Commercial Aviation Fatalities (Python, Scikit-Learn, ML)

• Preprocessed Pilot’s physiological simulation data by aggregating the sensors data for each second, normalized sensors data for each crew, dealt with imbalanced data

• Built Classification Models using Data Mining techniques like Decision Tree (Acc =80%), Random Forest (Acc

=89%), Extreme Gradient Boost (Acc =90%) to predict the cognitive state of the Pilot PIMA Indian Diabetes Data Analysis (RStudio, Hypothesis Testing, Logistic Regression)

• Cleaned, normalized and hypothesis testing to find significantly affecting factors on Diabetes

• Developed a logistic regression model to predict diabetic condition of a patient based on several medical factors

• Discovered major contributing factors and their order of importance on Diabetes and validating results New-York Hospital Inpatient Discharge Analysis (RStudio, Tableau)

• Parsed through large dataset of over 2.54 million+ records and 34 attributes to analyze inpatient discharge

• Innovated relevant visualizations using R and Tableau in order to understand the relevant factors which affect the patient’s length of stay in the hospital

• Devised solutions for hospitals like reviewing treatment data, disease patterns, hospital usage, payment methods, and planning for health care to improve experience of patients in hospitals of New York State EXTRA CURRICULAR

• Member of Data Club at Northeastern University



Contact this candidate