SWAPNALI GUJAR
*************@*****.*** +1-862-***-****) Bargersville IN 46106
Professional Summary
Adept Principal Data Scientist with a proven track record of delivering data science projects specializing in predictive analytics, machine learning, Gen-AI, NLP technologies, for various business use cases, generating significant value and cost avoidance. Led various projects from concept to production, mastering Python, R, and Agile methodologies. Overall 19+ years of professional experience, with 10+ years into leadership role, excelled in mentoring teams, showcasing exceptional leadership through leading by example & technical prowess in data science, bringing business and AI/ML teams together. Accomplishments
Led the successful delivery of scalable AI/ML end-to-end solutions that generated over $25M in business value.
Successful track record of managing a team of Data Scientists, Data Engineers and Generative AI Engineers
(15+ FTE and Contractors), including hiring, talent management, business stakeholder connects to drive success in AI/ML field and generate ROI
Primary inventor of a patent on a data science-driven project titled 'Systems and Methods for Determining Exhibited Useful Life of Sensors in Monitored Systems (NP US),' resulting in Patent Number: 11,959,433.
Received first prize in Indiana University's data-thon and hackathon 2023 on hate speech, developing NLP models for predicting racial bias and identifying bias in social media content.
Filed multiple Intellectual Property Inventions (IP) and Trade Secrets based on data science projects.
Received the 'Customer Success Award' and the 'Data Science Innovation Award' at Cummins.
Served as a champion of employee volunteering in community engagement efforts.
Finalist in Databricks Gen-AI World Cup Hackathon 2024
Featured Speaker at the Women and Technology Summit 2025 at Indiana University, delivering a compelling presentation on the practical implementation of gen-ai in real-life scenarios, showcasing innovative solutions and demonstrating thought leadership in the intersection of technology and society https://womenandtech.indiana.edu/summit/breakouts.html Skills
· Python, R Programming, SQL
· Gen-AI, LLM model Development, Natural Language Processing, RAG Pipeline, AI Agents
· Machine Learning, Data Visualization (Tableau,
Matplotlib, Seaborn)
· Databases (SQL, MongoDB, Neo4J), Agile Methodology
· Databricks, Microsoft Azure, AWS SageMaker, Google Colab Neural Networks, Deep Learning (Keras,
TensorFlow, PyTorch)
Big Data Analytics, Cloud computing, MLOps,
CI/CD using Gitlab, Web Scraping
Work History
Principal Data Scientist at Cummins Indiana, 02/2021 - Current
Cummins Engine Prognostics Modeling: With expertise in leading the development of a robust data science and feature engineering pipeline, I have successfully enabled the delivery of advanced statistical and machine learning models. My instrumental role in preventing catastrophic failures of Cummins Diesel Engines in the mining industry resulted in significant cost avoidance of ~$12M per year for Cummins and valued customers. Leveraging predictive analytics algorithms such as multi-linear regression, KNN, random forest, decision tree, and XG Boost, along with various types of clustering algorithms, I have effectively led development & deployment of time series data based predictive insights to accurately forecast potential engine failures across the world.
Relevant Service Request Recommender: I have architected a data science and data engineering pipeline to deliver an end-to-end recommendation engine for Cummins Field Service engineers. By utilizing textual data, this innovative solution has reduced the closing of service requests by an average of 11 days, generating ~$1M annual value in cost avoidance. As part of this project, I have shouldered various responsibilities including recommendation model development, multi-class classification model development using NLP, Gen-AI/LLM, and machine learning techniques, business communication, project entitlement, value derivation, and coaching and mentoring junior data scientists.
Cummins Product Reliability Analytics: My expertise extends to leading reliability data management and reliability prediction for Cummins products. By reporting to regulatory organizations and enabling business units to perform financial planning for warranty accrual purposes, I have demonstrated proficiency in core statistical product survival analytics methods such as Weibull fit.
HR Data Analytics: My contributions include developing HR analytics-based models to predict attrition rates and conducting descriptive analytics using exit interview data from ex-employees to provide insights for annual board meeting.
Gen-AI Based Claims Inspection for Component Failure Mode Prediction: This project is based Gen-AI technology, that includes RAG pipeline and fine tuning of pretrained hugging face LLM models to predict “failure mode bucket” indicating root cause of failure captured through warranty claims for Turbochargers. My contributions include designing of RAG pipeline & architecting overall project structure, training and evaluation of pretrained HuggingFace LLM models, while mentoring junior data scientists along the way, and business communication. POC of this project is also submitted for Databricks Gen-AI hackathon. https://www.youtube.com/watch?v=phRYOgJjGpE&t=9s
Gen-AI Based Data Transformation for Attribute Based Pricing of Products: This project is based Gen-AI technology, that includes AI agentic pipeline using LLM models to transform data required for pricing analytics of products, which is very tedious and time consuming to be done by human, delivering 1800 hours of savings in manual efforts.
Principal IOT Systems Engineer, Contractor-Cummins, IN, 06/2017 – 02/2021
Telematics Data Generation: Successfully led cloud-based integration of IOT data pipelines with various partners into Cummins cloud to collect, and store telematics data on-highway vehicles.
NOx Sensor Prognostics Using IOT Data: Successfully led physics inferred NOx sensor prognostics development based on IOT Data using machine learning algorithms to predict failure of these sensors. Filed a global patent on this solution.
Connected Solutions Engineer at Harman International, NJ, 05/2015 – 06/2017
Development of automation framework for high end connected cars in infotainment using VB Script and C#
.Net for embedded/firmware systems, remote connectivity, data serialization using JSON RPC, communicating with firmware over Ethernet
Data analytics for functional and performance characterization of tuner & satellite radio (AM/FM/HD, Sirius XM), audio data processing, OCR UI processing, over-the-air update, Navigation, traffic updates, collect and report statistics
Senior Associate at Cognizant, Michigan, 09/2014-05/2015
Powertrain system design, modeling, and simulation
Data analytics, data visualization for performance monitoring and calibration tuning
Project Lead, Task allocation, planning, estimations, status tracking, management reviews, delivery responsible, performance evaluation, risk management, costing and profit margin evaluation, conducting technical interviews for new hires and giving them project orientation, customer interaction, proof of concept projects execution.
Asst Consultant, TCS, Michigan, 10/2006-11/2012
Project lead for the embedded systems-based controls development for powertrain for fuel economy optimization and diagnostics features, MATLAB/Simulink model development and simulation
Development of data visualization tools for fuel economy improvement, emissions, component life estimation, for off-board calibrations
Development of multi-threaded applications of HEV vehicle, model-based development and auto-coding
Single point of contact between customer and team, Project management, team performance evaluation Software Engineer, KPIT, 07/2004-10/2006
Software Development of automotive application for CAN communication and KWP-based diagnostics protocols Education
Master of Science, Data Science, Indiana Uni Bloomington, IN, SP2025, GPA: 3.69/4
Bachelor of Engineering, Pune University India May 2004 Certifications
Microsoft Azure Foundations, #H867-6488
Purdue Badge Program Certification in Data Science II
Neural Network and Deep Learning, 01/09/21, #Z33UF79M2ZLV, deeplearning.ai
Machine Learning Using Python Certification, 03/15/20, #6KZMS9K8QP5M, IBM
Architecting software for Smart Internet of Things, 09/01/19, #J9UQ78E4FD4N
Industrial IoT Markets and Security Certification, 03/01/19, University of Colorado Boulder Additional Information
Academic Projects (M.S. In Data Science):
Home loan Credit Defaulter Prediction using Machine learning & Deep Learning
Jeopardy Contest Training Game Dev using NLP based Recommender Model
"What's My Worth": Data Management & Insights of "AI & Data Field Salaries”
Visualizing Human Genetics Disorders Using Parents' history
Image Classification using Fruits and Nuts Dataset for Computer Vision based Grocery Store Checkout
Research Assistance for ‘Mapping the Landscape of Generative AI in Retail: A Machine Learning- Driven Systematic Review to Uncover Existing Knowledge, Methods and Research Gaps’ paper to be presented at ITAA (International Textile & Apparel Association)