SMEET PATEL
**********.***@*****.*** +1-857-***-**** LinkedIn Boston, MA 02120
EDUCATION
Northeastern University Boston, MA
Master of Science in Engineering Management (Focus: Data Science) 3.88 September 2017 – December 2019 Relevant Coursework: Data Mining in Engineering, Collect/Store/Retrieve Data, Data Management and Database Design, Parallel Processing, Engineering Statistics, Data Warehousing and BI, Operations Research Gujarat Technological University Ahmedabad, India
Bachelor of Engineering in Mechanical Engineering 3.8 July 2011 – June 2015 SKILLS AND INTERESTS
Languages & Technologies: Python (NumPy, pandas, scikit-learn, SciPy, Matplotlib, Seaborn), R (tidyverse, dplyr, ggplot2), SQL, Apache Spark (PySpark), Tableau, AWS (Redshift, Glue, S3, SageMaker), MS Azure (DSVM), TensorFlow, Keras, PyTorch Databases: MongoDB (NoSQL), IBM DB2, Teradata, MySQL, PostgreSQL, SQL Server Core Competencies: Data Analytics, Machine Learning (Regression, kNN, SVM, Random Forest, Decision Trees, Neural Networks-CNN-RNN), Business Intelligence, NLP, Statistical Analysis (Bayesian, Hypothesis Testing, A/B Testing, ANOVA) WORK EXPERIENCE
Center for Translational Neuroimaging Boston, MA
Research Data Scientist Pharmaceutical Sciences February 2020 – Present
• Capitalized on performance boost of Amazon EMR with Spark to create critical data analysis reports
• Created R-CNN machine learning model in Python using TensorFlow with GPU to detect patients with CNS disease
• Migrated on-premise data from disparate sources to Amazon S3 to create a central repository for data analysis and ML
• Conveyed results of data analysis solutions to fellow scientists by creating useful data visualization in Tableau Norfolk Southern Corporation Atlanta, GA
Data Scientist Operations Research Co-op Fortune 500 January 2019 – May 2019
• Increased IEMS application’s on-hand cars reporting accuracy by 19% by developing an anomaly detection model in Python
• Reduced assessment time up to 70% using Python to compare live inventory data from Teradata and MongoDB
• Performed ETL to S3 data lake using AWS Glue to conduct pattern analysis on data of billions of records from all terminal
• Devised schedule analysis methodology by creating complex SQL queries and stored procedures to analyze output
• Built ARIMA model with 91% accuracy and STL decomposition in Python for demand forecasting at Austell yard
• Accelerated big data analytics speed by using Apache Spark to gain insights on different business aspects Eco Polymers Surat, India
Data Analyst Operations and Finance June 2015 – July 2017
• Engineered a revenue maximization model with RMF scoring for customer zoning in Python increasing revenue by 30%
• Automated financial and sales metrics evaluation of client portfolio using Python reducing manual efforts by 80%
• Developed compelling self-service dashboards in Tableau monitoring KPIs and provided integrated insights
• Created LSTM based deep learning neural networks model for sales prediction using Keras achieving 84% accuracy
• Performed ad-hoc analysis by writing SQL queries to serve various business-data demands
• Built and managed financial and sales statistics models and provided data-based executive-level reports PROJECTS
Neural Network Forecasting Model for Stock Prices (Python - TensorFlow, Keras) July 2019 – August 2019
• Created a 3—layer LSTM based RNN on MS Azure’s DSVM platform for stock market price point prediction
• Optimized the model by hyperparameter tuning and 1st order optimizer with gradient descent; model accuracy of 93% Solar Panel Size Prediction (Python - scikit-learn) February 2019 – April 2019
• Built machine learning model using Linear, Ridge and Lasso Regression using scikit-learn in Python
• Performed Feature Engineering, Cardinality Reduction; used k-fold cross-validation to assess model performance Predictive Analysis – MBTA Reliability & Passenger Traffic (R) June 2018 – August 2018
• Built Machine Learning algorithms for traffic prediction in R using Linear Regression and ETS with 87% accuracy
• Implemented hypothesis testing, normality test and ANOVA to present descriptive statistics of reliability over time