Machine Learning Data Scientist

Location:

Bellmawr, NJ

Posted:

April 22, 2024

Contact this candidate

Resume:

*****************@*****.***

256-***-****

Haddonfield, NJ, 08033

https://www.linkedin.com/in/sunillaudari/

https://github.com/sunil7634

EDUCATION

University of Alabama in Huntsville Aug 2017 – May 2022

Ph.D. in Physics (Astrophysics) Huntsville, AL

G.P.A. 4.0/4.0

Relevant coursework: Data Analysis Math I & II

SKILLS

Programming: Python, R, SQL, Bash

Optimization: Gurobi, Pyomo

Machine Learning: Cross Validation, PCA, Logistic Regression, KNN, Random Forest, Gradient Boosting, SVM, K-

means Clustering

Data Science Tools: Pandas, NumPy, Scikit-learn, Keras, TensorFlow, PyTorch

Visualization: Matplotlib, Seaborn, Plotly, Tableau

DevOps Tools: Jira, Confluence, Jenkins, ELK, Git, GitHub, SourceTree, postman, PyCharm, Data Bricks

EXPERIENCE

Client: Comcast Corporation Sep 2022 – March 2024

Data Scientist/Python Developer Philadelphia, PA

Customized channel distribution with SQL, increasing solver speed by threefold and reducing compute costs

for Wi-Fi mesh network.

Spearheaded the design, development, and deployment of ML solutions to optimize business decisions, saving

$1M in workforce expenses by accurately forecasting radio channels.

Architected an implemented an ensemble model integrating Scikit-learn random forest and XGBoost

algorithms,, achieving a remarkable 97% accuracy in predicting pipe seam type.

Achieved 95% info retention by reducing data dimensionality from 27 to 15 features for 30k points with PCA.

Leveraged operational data sources and optimization techniques to create tools for developing scenarios with

cost and enrollment optimization, delivering related real-world data insights.

Delivered actionable insights to senior management through compelling data visualization and comparative

analysis of 1M+ observations, facilitating data-driven decision-making process using Matplotlib, Plotly, Tabeau.

Collaborated with cross-functional teams to understand business requirements and translate them into

practical ML solutions.

University of Alabama in Huntsville (UAH) Aug 2020 - May 2022

Research Aide/Research Assistant Huntsville, AL

Featured on the cover page of Nature Astronomy, showcasing a significant galaxy mosaic crafted with Python

libraries including Pandas, Seaborn, and Plotly, leveraging 100 Gigabytes of Hubble Space Telescope Data.

Built a machine learning model (Laplacian Edge Detection Algorithm) to remove Cosmic rays and artifacts from

Hubble Space telescope data, reducing computation time by 10 min per filter (image).

Developed a tracking system for nearby galaxies with Astroquery (like SQL) to improve catalogue accuracy,

reducing error by 20%.

Enhanced data quality by 17% through cleaning and preprocessing using a comprehensive suite of Python

libraries, including NumPy, Pandas, Scikit-learn, and additional tools.

Client: BBVA Bank Feb 2017 - May 2019

Junior Data Scientist Birmingham, AL

Automated classifier models like Random Forest, SVM for specific segments of a customer base, saving 22

hours of labor per month.

Constructed operational reporting and data visualization tools, reducing contractor scheduling costs by10% in

the annual budget.

Deployed Auto-Sklearn to automate machine learning model selection, reducing modeling time by 2 hours per

session.

Devised scalable solutions for Amazon EC2-based cloud environments, boosting storage efficiency by 20% and

accelerating data analysis tools’ processing speed by 10% within AWS infrastructure.

Adapted configurations to align with client requirements, resulting in a positive increment in system

functionality and a 7% improvement in overall performance.

TRAINING

Pragmatic Institute May 2022 - July 2022

Data Science Fellow Remote

Employed NLTK on thousands of scraped Reddit posts to train classification models, reaching 92% accuracy

with the top-performing model (Naïve Bayes with Count vectorize).

Forecasted the success of bank marketing campaigns using various machine learning techniques. The best

model (Logistic regression) achieved 92% accuracy, 93% precision, and 97% recall.

Developed multiple ML models for predicting customer churn in the European banking industry, with the

Random Forest model demonstrating the best performance (F1=87%, recall=83%, precision=91%).

Achieved an average classification accuracy of 90% using Natural Language Processing (NLP) techniques,

including Count Vectorizer/Hash Vectorizer, Term Frequency-Inverse Document Frequency (TF – IDF),

Tokenizing/Stemming, Multinomial Naïve Bayes for categorizing into various genres.

Contact this candidate