Post Job Free
Sign in

Machine Learning Data Analyst

Location:
Bronx, NY
Posted:
July 17, 2025

Contact this candidate

Resume:

AWNISH SHANKAR

+1-716-***-**** Buffalo, NY-***** www.linkedin.com/in/awnish005/ **************@*****.*** Research Profile Portfolio EDUCATION

State University of New York at Buffalo Aug 2023 – May 2025

Master of Science, Industrial Engineering (Major Data Science) GPA 3.5

Relevant Coursework - Operations Research, Statistical Data Mining I & II, Machine Learning, Data Analysis and Predictive Modeling, Deep Learning, Programming and Database Management, E-Business & Supply Chain Management Indian Maritime University (Kolkata Campus) Aug 2014 – Oct 2018

Bachelor of Science, Marine Operations and Applied Mathematics GPA 3.8 WORK EXPERIENCE

SUNY RF, Division of Behavioral Medicine at University at Buffalo NY, USA Research Data Analyst May 2024 – Present

Leveraged Python to analyze the relationship between mothers’ dietary patterns during pregnancy and postpartum, birth-related outcomes, and their children's growth and nutritional patterns for 10+ journal papers.

Trained 50+ graduate students on utilizing mixture and longitudinal statistical modeling techniques in Python.

Currently leading a team of 4 to develop an NLP-powered web application that delivers personalized text-based interventions, real-time emotional support, and relapse risk prediction to assist pregnant women in smoking cessation. Synergy Marine Group Singapore

Data Scientist – Operations & BI May 2020 – June 2023

Leveraged 0.5TB historical data to create a maintenance forecasting model in Python with regression, time-series and statistical algorithms, attaining 87% accuracy and saving $2M annually by reducing excess inventory and optimizing vessel utilization.

Expertly orchestrated an end-to-end Extract-Transform-Load (ETL) and model training pipeline, utilizing Azure, Python, and SQL.

Created 30+ interactive Power BI dashboards integrated with data pipelines, delivering key insights on machinery performance, compliance data, human capital management, and cargo operations for vessel supervisors.

Collaborated with a cross-functional team to automate data pipeline using SQL Server and integrate with 10+ BI dashboards for top Management levels.

Optimized database performance by analyzing and tuning SQL queries, resulting in a 20% improvement in query response time and a 10% reduction in server load.

Performed extensive EDA on 700K data points from annual seafarer safety surveys, analyzing onboard practices and identifying key factors contributing to accidents, resulting in a 30% reduction in onboard incidents.

Built and deployed an anomaly detection model using deep learning to detect onboard incidents across 100+ vessels. Technical Analyst Nov 2019 – May 2020

Developed regression models to optimize RPM, boosting engine performance by 4%, resulting in $3M+ yearly fuel savings.

Cut machinery spare parts inventory cost by 20% & saved $750K+ by resolving stock issues with Pareto and just-in-time (JIT) analysis.

Leveraged regression analysis on 0.5+ TB of historical machinery KPI data using Python, implemented 50+ regression models in Excel, and developed a Planned Maintenance System (PMS), saving $1M annually. Operations Analyst Oct 2018 – Nov 2019

Reduced 8% costs and 12% transit time by optimizing transportation network through network optimization techniques.

Engaged in an $800K ship revamp project. Utilized Excel and Tableau for inventory control, cost analysis, workforce optimization, and spatial management. Accomplished a 30% project time reduction, $90K cost savings.

Utilized Excel on plant efficiency, advised changes in plant setups (focusing on operational KPIs) cutting down port operations from 12 to 9 hours.

ACADEMIC PROJECTS

Thesis Factors Affecting Transitions in Dual Tobacco Use Among Pregnant Women University at Buffalo Python

Analyzed dual (cigarette/e-cigarette) tobacco use transition patterns among 96K pregnant women using longitudinal data (2015-2024) to identify behavioral changes from pre-pregnancy through pregnancy.

Applied cluster analysis to identify optimal tobacco use classes across time points; used Expectation Maximization (EM) algorithm and Likelihood Ratio Test (LRT) to assess class consistency; leveraged multinomial logistic regression to estimate transition odds by sociodemographics and risk perceptions.

Findings showed a strong shift to non-use (~90%) during pregnancy; key predictors included income and harm perceptions (e.g., belief secondhand smoke harms fetus: OR=7.27, 95% CI: 2.37–22.33), driving cessation among heavy users. Sign Language Recognition in Real Time University at Buffalo Python

Developed a sign language recognition model using computer vision and deep learning methods. Utilized OpenCV and PIL to create a 10k image dataset with data augmentation for noise addition, built CNN models using Keras, PyTorch, and TensorFlow, achieving a real-time detection accuracy of 97%.

Global Temperature Change Analysis and Predictive Modeling University at Buffalo Python, Power BI

Implemented feature engineering with 30% feature reduction using 4 selection methods, optimized 6 ML models, achieved the lowest MSE of 0.2215 with XGBoost, validated stability with 10k bootstraps, and presented insights with a Power BI dashboard. Customer Churn Analysis and Prediction in Telecommunication University at Buffalo R

Collaborated with graduate students to predict customer churn with an 86.7% F1 score using various supervised ML algorithms, effectively handling dataset imbalance with methods like SMOTE, oversampling, and under-sampling. Predictive Modeling for Heart Attack University at Buffalo R

Leveraged advanced 5 ML models, including Random Forest, SVM, and XGBoost, alongside 4 algorithms for hyperparameter tuning, resulting in an improved C-statistic of 0.84 (95% confidence interval [CI]: 0.69-0.88) with the Random Forest. TECHNICAL SKILLS

Language/Software: Python (Pandas, NumPy, TensorFlow, Keras, Scikit-Learn, PIL, Matplotlib, Plotly, etc.), HTML, JavaScript, C++, SQL, R, SAS, Cplex

Data Analysis & Modeling/Optimization: Random Forest, XGBoost, Bootstrapping, Hypothesis Testing, PCA, Clustering, Generative Models, NLP, Time Series Modeling, Visualization, A/B Testing, Deep Learning, Recommendation System, Linear regression, GLMs, Linear Programming, Network Models, Sensitivity Analysis and Duality, Simplex Algorithm

Data Management & Visualization/Tools: MySQL, MongoDB, Neo4j, Tableau, Power BI, GCP, Django, Git, AWS, Apache Spark, Hadoop



Contact this candidate