Senior Data Scientist

Location:

Milwaukee, WI, 53203

Posted:

January 02, 2025

Contact this candidate

Resume:

Jason Spaw

Lead Data Scientist

ML-Ops Architect/ GenAI Specialist

Phone: 534-***-**** Email: *********@*****.***

Profile Summary

I have 16 years of IT experience, including 12 years in data science and data analytics, with expertise in developing machine learning solutions to solve business problems. My background includes Python development, and I am well-versed in various machine learning techniques, such as Linear and Logistic Regression, Neural Networks, Decision Trees, and Ensemble Methods. Additionally, I am comfortable with deployment and integration on cloud technologies like AWS and Azure.

Experience applying Naïve Bayes, Regression and Classification Analysis, Neural Networks/ Deep Neural Networks, Decision Tree / Random Forest and Boosting machine learning techniques.

Experience in statistical models on large data sets using cloud computing services such as AWS, Azure and GCP.

Applying statistical analysis and machine learning techniques to live data streams from big data sources using Spark and batch processing techniques.

Applying statistical and predictive modeling methods to build and design reliable systems for real-time analysis and decision-making.

Expertise in developing creative solutions to business use cases through data analysis, statistical modeling, and innovative thinking.

Performing EDA to find patterns in business data and validate findings using state-of-the-art modeling and algorithms.

Leading teams to produce statistical or machine learning models and create APIs or data pipelines for the benefit of business leaders and product managers.

Deep knowledge of statistical procedures that are applied in both Supervised and Unsupervised Machine Learning problems.

Experience in applying Machine Learning techniques for sales and marketing teams to provide forecasting and improve decision-making.

Excellent communication and presentation skills with experience in explaining complex models and ideas to both fellow team members and non-technical stakeholders.

Leading teams to prepare clean data pipelines and design, build, validate, and refresh machine learning models.

Technical Skills

Analytic Development: Python, R, Spark, SQL

Python Packages: Numpy, pandas, scikit-learn, TensorFlow, Keras, PyTorch, fastai, SciPy, Matplotlib, Seaborn, Numba

Programming Tools: Jupyter, RStudio, Github, Git

Cloud Computing: Amazon Web Services (AWS), Azure, Google Cloud Platform (GCP)

Machine Learning: Natural Language Processing & Understanding, Machine Intelligence, Machine Learning Algorithms

Analysis Methods: Advanced Data Modeling, Forecasting, Predictive, Statistical, Sentiment, Exploratory, Stochastic, Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Multivariate analysis, Sampling methods, Segmentation, Clustering, Sentiment Analysis

Statistical Analysis: Predictive Analytics, Decision Analytics, Big data and Queries Interpretation, Design and Analysis of Experiments

Artificial Intelligence: Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN, Regression, Naïve Bayes

Natural Language Processing: Text analysis, classification, pattern recognition, sentiment analysis

Deep Learning: Machine perception, Data Mining, Machine Learning algorithms, Neural Networks, TensorFlow, Keras, PyTorch, Transfer Learning

Data Modeling: Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, time-series analysis

Applied Data Science: Natural Language Processing, Machine Learning, Social Analytics, Predictive Maintenance

Soft Skills: Excellent communication and presentation skills; ability to work well with stakeholders to discern needs accurately, leadership, mentoring

Professional Experience

ML-Ops Architect/ GenAI Specialist September 2023 – Present

Northern Mutual, Milwaukee, WI

Summary: At Northern Mutual, I developed an NLP solution using TensorFlow/Keras with Google’s BERT model to analyze customer feedback and claims communications in healthcare insurance. This high-accuracy model, deployed via AWS, monitors customer sentiment in real-time, alerting us to spikes in complaints or positive feedback. Additionally, I integrated OpenAI's GPT-3.5 turbo within a REST API framework to automate customer responses, enhancing engagement. As the ML-Ops engineer, I ensured the smooth production deployment and operation of these models, enabling Northern Mutual to improve customer experience through proactive insights and automation.

Developed and deployed GenAI models to enhance predictive analytics, focusing on personalized financial recommendations and investment insights.

Leveraged Generative AI and Natural Language Processing (NLP) to analyze unstructured data, providing insights for personalized financial planning and customer support.

Built and fine-tuned transformer models (e.g., BERT, GPT) for client segmentation, recommendation systems, and healthcare-related question answering, enhancing accuracy and engagement.

Collaborated with cross-functional teams to integrate fine-tuned GenAI models into broader NLP projects, improving customer support and claims processing workflows.

Served as an ML-Ops Architect, utilizing Docker and Kubernetes for model deployment on Amazon EKS Clusters, and orchestrating inference and retraining pipelines using Jenkins and Apache Airflow.

Implemented model monitoring with AWS Lambda, automatically triggering retraining pipelines via ML-Flow to maintain model accuracy over time.

Employed A/B testing and canary deployments for model updates in production, minimizing disruptions while optimizing performance.

Curated and processed large text datasets through tokenization, stopword removal, and normalization to ensure high data quality.

Trained models using TensorFlow or PyTorch, optimizing performance through hyperparameter tuning and evaluation metrics like F1 Score and Exact Match for BERT.

Pre-processed text data with stemming, lemmatization, and vectorization to standardize inputs and improve model efficiency.

Clean, preprocess, and transform raw data to make it suitable for analysis and machine learning model training, addressing issues such as missing values and outliers through AI methodologies.

Evaluate and select appropriate machine learning algorithms based on AI principles, considering the problem domain and data characteristics to achieve optimal results.

Developed Latent Dirichlet Allocation (LDA) topic modeling algorithms to identify key themes in customer communications, enhancing insights for claims processing.

Utilized Python libraries such as Gensim and NLTK for text preprocessing and LDA modeling, enhancing data interpretability.

Fine-tuned BERT and GPT models specifically for the healthcare domain, boosting accuracy in claims-related text generation and response tasks.

Deployed GenAI models on cloud platforms (AWS, Azure, Google Cloud) to ensure scalability and availability for customer service and insurance agents.

Exposed models as RESTful APIs to facilitate real-time sentiment analysis and automated response generation for customer interactions.

Conducted load testing and optimized API performance to ensure rapid response times for end-user applications.

Implemented security protocols, including authentication and authorization, to safeguard sensitive data while complying with healthcare privacy regulations.

Continuously monitor and optimize AI models, applying machine learning strategies to improve accuracy and efficiency through techniques such as hyperparameter tuning.

Collaborate with cross-functional teams to align AI solutions with business objectives, ensuring that machine learning applications are successfully implemented and adopted.

Continuously retrained models to adapt to evolving customer needs and data trends, ensuring long-term relevance and accuracy.

Fine-tuned GPT-3.5 models on healthcare-specific data, enhancing perplexity and coherence for automated text generation.

Evaluated GPT-3 model performance through human assessment and automated metrics, improving accuracy in customer inquiry responses.

Designed data visualization dashboards in Power BI and Tableau, translating GenAI model outputs into actionable business insights.

Documented GenAI model development processes, creating reusable code libraries to support continuous improvement and cross-team knowledge sharing.

Engaged in ongoing research and applied GenAI advancements, driving innovative solutions aligned with Northwestern Mutual’s objectives.

Lead Data Scientist Nov 2021 – Aug 2023

Dominion Energy, Richmond, VA

Summary: Leveraged advanced data science techniques to optimize energy operations and improve customer experiences. Developed and deployed predictive models to forecast energy consumption patterns and identify potential savings. Conducted in-depth data analysis to uncover valuable insights and inform data-driven decision-making. Implemented natural language processing techniques to extract valuable information from customer feedback and social media, enhancing customer understanding and satisfaction.

Extracted relevant data from SQL databases using complex SQL queries, preparing it for subsequent analysis and modeling.

Developed, trained, and evaluated a variety of machine learning models, including Decision Trees, Random Forests, Linear Regression, Neural Networks, Logistic Regression, and Gradient Boosted Trees, to predict potential energy consumption patterns and identify opportunities for energy efficiency improvements.

Leveraged XGBoost to select the most influential features for predicting energy consumption, improving model accuracy and interpretability.

Conducted in-depth EDA on diverse energy-related datasets, including weather data, historical consumption patterns, and demographic information, to uncover underlying trends and patterns.

Implemented data preprocessing techniques, such as normalization, imputation, and noise reduction, to ensure data quality and consistency.

Developed and validated machine learning models using supervised and unsupervised learning techniques to extract valuable insights from energy consumption data.

Applied feature engineering techniques, including PCA and feature scaling, to identify and select the most relevant features for improving model performance.

Employed techniques like Random Train/Test Split and K-NN Regression to evaluate model performance and ensure generalizability.

Defined and calculated relevant performance metrics, such as accuracy, precision, recall, and F1-score, to assess model effectiveness.

Validated and tested models on external datasets to assess their performance in real-world scenarios.

Utilized Python for clustering analysis to identify distinct customer segments based on energy consumption patterns and employed NetLog for simulation-based analysis.

Employed statistical methods and Python's visualization libraries to explore energy consumption data, identify trends, and visualize findings effectively.

Applied clustering techniques like K-means, Gaussian Mixture Models, and DBSCAN to segment customers based on energy usage patterns, enabling targeted marketing and energy efficiency programs.

Conducted exploratory data analysis on socioeconomic factors influencing energy consumption, identifying correlations between variables and informing targeted interventions.

Experiment with various classification algorithms, such as decision trees, logistic regression, and KNN, to predict customer behavior and optimize energy delivery.

Utilized time series analysis techniques, such as SARIMA, to forecast future energy demand, enabling proactive planning and resource allocation.

Designed, implemented, and deployed AI-powered solutions to address complex energy challenges, improving operational efficiency and customer satisfaction.

Leveraged NLP techniques to extract insights from unstructured text data, such as customer feedback and social media, enhancing customer understanding and sentiment analysis.

Sr. NLP / ML-Ops Engineer Sep 2019 – Oct 2021

Sanofi, Bridgewater, NJ

Summary: Worked on a sales forecasting project for a using an artificial neural network developed in PyTorch along with Facebook’s Prophet model. I performed data cleaning in Python on a large dataset including several years’ worth of data across different departments in dozens of shops and produced highly accurate forecasts for each store and department.

Investigated time series machine learning models like Vector ARIMA, SRIMA, and Facebook Prophet for time series data analysis.

Utilized libraries such as Pandas, NumPy, Seaborn, SciPy, Matplotlib, and Scikit-learn in Python to develop various machine learning algorithms. These algorithms included linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, and KNN.

Implemented machine learning algorithms and concepts such as K-means Clustering, Gaussian distribution, and decision trees.

Create AI-driven computer vision solutions that utilize machine learning for image recognition, object detection, and video analysis to automate processes and enhance user experience.

Integrate diverse data sources into AI frameworks, employing machine learning algorithms to build comprehensive datasets for model training and validation, ensuring data quality and relevance.

Analyzed data using data visualization and feature extraction tools, along with supervised machine learning techniques to meet project objectives.

Examined large data sets and applied machine learning techniques to develop predictive and statistical models.

Conducted Supervised, Unsupervised, and Semi-Supervised classification and clustering of warehouse inventory in the analysis.

Classified documents using machine learning techniques such as Neural Network and Deep Learning, K-neighbors, K-means, Random Forest, Logistic Regression, and SVM.

Constructed an analysis model to optimize resource allocation and warehouse layout.

Documented logical data models, semantic data models, and physical data models.

Performed complex modelling, simulation, and data and process analysis.

Conducted analysis through different regression and ensemble models in machine learning for forecasting.

Deploy machine learning models in production environments, utilizing AI tools to monitor their performance over time and implementing necessary updates and improvements.

Document machine learning model development processes and results, effectively communicating findings and recommendations related to AI solutions to stakeholders through reports and presentations.

Employed key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation, and boosting for analysis.

Created a model using Facebook Prophet that delivered highly accurate predictions of weekly sales.

Evaluated model performance on a large dataset, which included multiple years of daily data for dozens of departments per shop and dozens of shops.

Deployed the model to produce highly accurate 6-month forecasts up to 6 months in advance for every shop and department.

Sr. Data Scientist Feb 2017 – Aug 2019

Regions Financial Corporation, Birmingham, AL

Summary: Improved financial performance and customer satisfaction through data-driven insights. Developed and deployed sophisticated predictive models to forecast trends, assess risk, and optimize decision-making. Leveraged advanced machine learning techniques to enhance fraud detection, credit scoring, and customer segmentation. Collaborated with business stakeholders to identify opportunities for data-driven innovation, leading to increased efficiency and revenue.

Cleaned and transformed financial and customer data from various sources, creating structured datasets for subsequent analysis and modeling.

Provided data science consulting services to business units, identifying opportunities to leverage data to optimize processes, improve decision-making, and uncover new business opportunities.

Led data science projects from inception to completion, delivering actionable insights and recommendations to stakeholders.

Integrated diverse datasets, including transactional, customer profile, and marketing data, to create a comprehensive view of customer behavior and financial performance.

Developed and refined predictive models, such as regression, ARIMA, and time series analysis, to forecast financial metrics and assess risk exposure.

Enhanced model accuracy and predictive power through techniques like feature engineering, hyperparameter tuning, and model selection.

Preprocessed transactional and behavioral data, applying feature selection and dimensionality reduction techniques like PCA to improve model efficiency and performance.

Implemented machine learning models, including Support Vector Machines (SVM) and neural networks, to address critical financial challenges like fraud detection, credit scoring, and risk assessment.

Utilized clustering algorithms, such as K-means and Gaussian Mixture Models, to segment customers based on their behavior and preferences, enabling targeted marketing campaigns and personalized financial products.

Achieved high levels of accuracy, precision, and recall in model performance, ensuring reliable and actionable insights for financial decision-making.

Sr. Data Scientist Oct 2014 – Jan 2017

BMW, Woodcliff Lake, NJ

Summary: At BMW, I developed computer vision-based OCR models using Python, OpenCV, and TensorFlow to optimize document processing workflows. I collaborated with Finance to enhance Settlement Payment Reports, building data pipelines that linked SKUs with corresponding UPC/EAN codes and enabling automated data classification. Additionally, I streamlined machine learning operations, deploying models and setting up monitoring pipelines to detect model drift and ensure continuous improvement.

Implemented a CNN-based Tesseract OCR solution for named entity recognition (NER).

Collaborated with Finance to troubleshoot and enhance Settlement Payment Reports for cash reconciliation.

Built a data pipeline to link SKUs with corresponding UPC/EAN codes.

Updated Python scripts to align training data with our AWS Cloud Search database, enabling classification by assigning response labels.

Developed computer vision-based Optical Character Recognition (OCR) models.

Validated data for the transition from MWS to SPAPI.

Partnered with cross-functional teams, including data scientists, user researchers, product managers, designers, and engineers, to improve the consumer experience across platforms.

Conducted analysis on large datasets to derive insights on user behavior, guiding product and design decisions.

Built, trained, and deployed machine learning models.

Designed and implemented a CI/CD pipeline to automate model development and deployment.

Set up model monitoring pipelines to trigger retraining when model drift was detected.

Utilized Pandas and Feature Tools in Python for data analytics, data cleaning, and feature engineering.

Performed analytics for Supply Chain Reporting, utilizing Power BI.

Developed internal reporting dashboards in Periscope.

Extracted data from Amazon Redshift within the AWS cloud environment.

Data Scientist Jan 2012 – Sep 2014

PetSmart, Phoenix, AZ

Summary: At PetSmart in Phoenix, I developed NLP-driven models using LDA for topic clustering to understand customer sentiment and leveraged K-Means to attribute causes to negative reviews. I applied PCA to manage high-dimensional data, enhancing demand forecasting accuracy with MLlib and Gradient Boosted Trees (GBT). Additionally, I designed data pipelines with Apache Airflow and implemented hyperparameter tuning to optimize model performance. By integrating diverse data sources and using scalable analytics in Hadoop and SQL, I provided data-driven insights to improve customer satisfaction and operational efficiency.

Conducted NLP-driven proof-of-concepts with Latent Dirichlet Allocation (LDA) for topic clustering.

Applied Principal Component Analysis (PCA) to manage high-dimensional sparse categorical data effectively.

Leveraged NLP techniques and K-Means clustering to analyze and attribute causes to negative customer reviews.

Enhanced forecast accuracy through sophisticated data engineering practices.

Used hyperparameter tuning to maximize model performance.

Skilled in cloud computing for full-cycle machine learning implementations.

Designed and managed data pipelines and workflows using Apache Airflow.

Built and optimized machine learning models for demand forecasting using MLlib and Gradient Boosted Trees (GBT).

Performed web scraping to create Amazon review datasets for analysis.

Developed predictive models to assess and mitigate delivery delay risks.

Built demand forecasting models utilizing IRI syndicated data.

Leveraged Hadoop and SQL databases for scalable data analytics.

Integrated diverse data from internal and external sources to generate comprehensive datasets for analysis.

Implemented best practices in database design, data storage, and retrieval processes.

Established data governance policies to maintain data privacy and security.

Collaborated with stakeholders to refine model features, improving prediction accuracy.

Integrated multiple data sources into a cohesive master dataset.

Monitored and fine-tuned data systems and pipelines to align with business requirements.

Data Analyst Nov 2009 – Dec 2011

News Corp, New York, NY

Oversaw data processing, cleaning, and validation to maintain data integrity for analysis.

Performed univariate, bivariate, and multivariate analyses to generate and assess new features.

Optimized feature extraction in the machine learning pipeline, greatly improving system efficiency.

Contributed to demand forecasting to maximize port utilization.

Addressed analytical challenges and effectively communicated methods and insights.

Integrated external data sources through data mining and scraping techniques.

Improved data collection methods to capture critical information for analytical system development.

Data Consultant Jan 2008 – Oct 2009

Marriot, Bethesda, MD

Develop and implement data-driven solutions to optimize guest experience and operational efficiency.

Perform in-depth data analysis to identify trends and insights for informed business decisions.

Collaborate with cross-functional teams to integrate data across platforms, enhancing reporting and predictive accuracy.

Ensure data quality and governance to maintain data integrity for strategic initiatives.

Conduct training sessions for staff on data analytics tools and best practices to enhance data literacy within the organization.

Education

M.S. Data Science and Analytics

University of Missouri

B.S. Chemical Engineering

Missouri University of Science and Technology

Contact this candidate