Sign in

Data Scientist - Analyst, Python, R, SQL

Boston, MA
February 19, 2020

Contact this candidate


SMEET PATEL +1-857-***-**** LinkedIn Boston MA 02120


Northeastern University Boston, MA

Master of Science in Engineering Management (Focus: Data Science) GPA: 3.88 September 2017 – December 2019

• Relevant Coursework: Data Mining in Engineering, Data Management and Database Design, Collect/Store/Retrieve Data, Data Warehousing and Business Intelligence, Probability and Statistics, Parallel Processing for Data Analytics Gujarat Technological University Ahmedabad, India

Bachelor of Engineering in Mechanical Engineering GPA 3.8 July 2011 – June 2015 Skills and Interests

• Languages and Technologies: Python (NumPy, Pandas, Matplotlib), R, SQL, Hadoop, Apache Spark, Tableau, Power BI, Amazon Web Services (EC2, S3, Redshift), Microsoft Azure (DSVM), TensorFlow, Keras, PyTorch

• Databases: MongoDB, IBM DB2, Teradata, MySQL, PostgreSQL

• Core Competencies: Data Mining-Warehousing-Analytics, Machine Learning, Business Intelligence (Visualization, Reporting), ETL, Statistical Modeling, Natural Language Processing (NLP), Text Mining, Quantitative Analysis Work Experience

Norfolk Southern Corporation Atlanta, GA

Data Scientist Intern Operations Research Department Fortune 500 January 2019 – May 2019

• Accelerated discrepancy analysis by over 25x by writing text mining script in R comparing text reports to MongoDB data

• Reduced assessment time up to 70% by creating a Python program to compare live inventory data from Teradata and MongoDB

• Performed ETL from AWS Redshift and IBM DB2 to conduct pattern analysis on data of billions of records from all terminals

• Increased IEMS application’s on-hand cars reporting accuracy by 19% by developing an anomaly detection model in Python to identify and remove false positives coming from different cases

• Built ARIMA model with 91% accuracy in Python incorporating STL decomposition for demand forecasting at Austell yard

• Operated validation testing for ‘dwell time’ feature by creating Python script in Test and QA before its deployment to production

• Devised schedule analysis methodology by creating complex SQL queries and stored procedures and analyzing the output Northeastern University Boston, MA

Graduate Teaching Assistant Engineering Economy September 2018 – December 2018

• Mentored a class of 38 undergraduate students, graded assignments, prepared lectures, held office hours and proctored exams

• Explained topics on Predictive Financial Analytics: Time Value of Money, DCFs, Inflation, and Capitalization Eco Polymers Surat, India

Data Analyst Operations and Finance June 2015 – July 2017

• Decreased manual efforts by 80% by automating financial and sales metrics evaluation of client portfolio using Python

• Engineered a revenue maximization model by creating RMF analysis with K-Means Clustering in Python for Customer Zoning

• Developed compelling self-service dashboards using Tableau measuring KPIs and provided integrated insights to managers

• Expedited P2P process efficiency by 30% by overhauling supplier on-boarding, PR-PO cycle time and on-time payment

• Conducted exploratory and retrospective analyses of operational and financial data to amplify competitive performance

• Built and managed financial statistics models and provided data-based executive-level ad-hoc analysis Academic and Independent Projects

Neural Network Forecasting Model for Stock Prices (Python: TensorFlow) July 2019 – August 2019

• Created a 3-layer LSTM based Neural Network on Microsoft Azure’s DSVM platform for stock market price point prediction

• Optimized the model by creating First Order Optimizer with Gradient Descent and achieved model accuracy of 93% Predictive Analysis - MBTA Red Line Reliability and Passenger Traffic (R) June 2018 – August 2018

• Built predictive Machine Learning algorithms for traffic prediction in R using Linear Regression and ETS with 87% accuracy

• Implemented hypothesis testing, normality test, and ANOVA to present descriptive statistics of reliability of red line over time Big Data Analysis - Chicago Metro Area Crime (R, MongoDB) April 2018 – June 2018

• Capitalized on the performance of MongoDB by attaching and cleaning big data (6.6 million rows) using mongolite in R

• Elicited, examined and visualized correlations and patterns among variables to provide comprehensive inferences

Contact this candidate