May ****
Sept 2019 – Dec 2019
Sept 2019 – Dec 2019
Expected May 2021
GPA: 4/4
Jan 2020 – April 2020
May 2020 – August 2020
MITUL SHAH
**** ******** **., ***ton, MA 02215 857-***-**** ****.****@************.*** www.linkedin.com/in/mitul-shah- 5b9285160/
EDUCATION
Northeastern University, Boston, MA
Master of Science, Data Analytics Engineering
Relevant Coursework: Data Mining, Database Management, Data Visualization, Probability and Statistics Pune University, Pune, India
Bachelor of Engineering, Mechanical Engineering
Relevant Coursework: Optimization Techniques, Operational Research, Mathematics SKILLS
Languages: R, Python, SQL
Libraries: NumPy, Pandas, Scikit-Learn, ggplot2, dplyr, stringr, lubridate, pySpark, keras, scrapy Tools: Tableau, Jupyter, Spyder, MS Excel, RStudio, Minitab ETL: MySQL, PostgreSQL, Apache Spark, Neo4j, MongoDB ACADEMIC PROJECTS
Sparkify Data Modelling with Cassandra (Apache Cassandra, Python, Data, ETL)
• Created an Apache Cassandra database to know when and how users are playing songs on streaming app
• Implemented ETL pipeline to pre-process data using pandas and modelled tables according to queries Tesla Giga Factory Supply Chain Database Design (MySQL, EER Model, MongoDB)
• Designed conceptual EER, UML diagram to optimize the supply chain network of the manufacturing plant
• Implemented the conceptual model in MySQL with the set constraints and imported data from Excel
• Created stored procedures,triggers and queried the MySQL database for further analysis
• Build and queried a MongoDB to understand and solve the optimization problem using NoSQL DB Reducing Commercial Aviation Fatalities (Python, Scikit-Learn, ML)
• Preprocessed Pilot’s physiological simulation data by aggregating the sensors data for each second, normalized sensors data for each crew, dealt with imbalanced data
• Built Classification Models using Data Mining techniques like Decision Tree (Acc =80%), Random Forest (Acc
=89%), Extreme Gradient Boost (Acc =90%) to predict the cognitive state of the Pilot PIMA Indian Diabetes Data Analysis (RStudio, Hypothesis Testing, Logistic Regression)
• Cleaned, normalized and hypothesis testing to find significantly affecting factors on Diabetes
• Developed a logistic regression model to predict diabetic condition of a patient based on several medical factors
• Discovered major contributing factors and their order of importance on Diabetes and validating results New-York Hospital Inpatient Discharge Analysis (RStudio, Tableau)
• Parsed through large dataset of over 2.54 million+ records and 34 attributes to analyze inpatient discharge
• Innovated relevant visualizations using R and Tableau in order to understand the relevant factors which affect the patient’s length of stay in the hospital
• Devised solutions for hospitals like reviewing treatment data, disease patterns, hospital usage, payment methods, and planning for health care to improve experience of patients in hospitals of New York State EXTRA CURRICULAR
• Member of Data Club at Northeastern University