Post Job Free
Sign in

Data Scientist Machine Learning

Location:
Raleigh, NC
Posted:
February 17, 2025

Contact this candidate

Resume:

Sushant Kotwal

Raleigh, NC ************@*****.*** +1-812-***-**** linkedin.com/in/SushantKotwal

github.com/sushantkotwal96

SUMMARY

Results-driven Data Scientist with 5+ years of experience in AI, Machine Learning, and Data Analytics. Skilled in developing high- accuracy predictive models, optimizing ETL workflows, and leveraging cloud platforms (Azure & GCP) for scalable solutions. Proficient in Python, SQL, LLMs, and advanced statistical techniques, with hands-on experience in NLP, Deep Learning, and Business Intelligence tools (Power BI, Tableau). Adept at transforming complex datasets into actionable insights, enhancing marketing strategies, and automating data pipelines to drive business impact. Proven track record of technical leadership, cross- functional collaboration, and delivering data-driven solutions that improve efficiency and decision-making. SKILLS

•Programming Languages: Python, R, SQL, HTML, CSS, Java, PowerShell

•Databases & Version Control: MySQL, PostgreSQL, MongoDB, MSSQL Server, Git

•Libraries & Frameworks: NumPy, Pandas, Matplotlib, TensorFlow, Keras, PyTorch, Scikit-learn, Flask, NLTK

•Statistical Techniques: Hypothesis Testing, A/B Testing, ANOVA, DOE, T-tests

•Tools: Tableau, Power BI, Excel, Google Looker Studio, Azure Machine Learning Studio, Snowflake, MLFlow

•Environments & Cloud Platforms: Google Big Query, GCP Dataflow, Google Cloud Platform (GCP), Azure Data Factory, Databricks, PySpark

•Other: Large Language Models (LLMs), Exploratory Data Analysis (EDA) tools, Data Governance tools, XML, JSON, Struts PROFESSIONAL EXPERIENCE

LexisNexis

Data Scientist

Jul 2024 – present

•Developed machine learning models and NLP pipelines using Python and PySpark on Google Cloud Dataflow, improving data accessibility and enabling more informed decision-making across teams.

•Performed statistical analysis and hypothesis testing in Python to identify key trends, which led to the strategic expansion of product coverage in five additional states.

•Designed and built interactive Looker dashboards with advanced LookML functions, streamlining financial and subscription reporting. This reduced report generation time by 40% and provided stakeholders with faster, data-driven insights

•Utilized SQL and Common Table Expressions (CTEs) in Big Query to preprocess, clean, and validate large datasets, increasing data accuracy by 25% and significantly reducing reporting inconsistencies.

•Engineered scalable ETL workflows to process 20 million metadata records using Cloud Dataflow, Cloud Composer, and distributed computing, optimizing data pipeline efficiency and ensuring seamless data availability for analytics.

•Implemented a Retrieval-Augmented Generation (RAG) approach using OpenAI’s GPT-3.5 LLM on Vertex AI, integrating FAISS for efficient document retrieval, enabling automated text summarization, improving document classification accuracy, and enhancing insights from unstructured text data

Indiana University

Data Scientist

Jul 2023 – Jun 2024

•Developed an AI-driven sign language detection model using ensemble CNNs and VGG16 in Azure ML Studio, achieving 85% accuracy in recognizing hand gestures; analyzed 120,000 images with Python and Databricks to ensure data integrity and quality.

•Automated data preprocessing using PySpark and Azure Data Factory, reducing manual effort by 40% while improving efficiency and reliability; leveraged hypothesis testing and ANOVA to refine image processing, leading to a 20% boost in model performance.

•Established rigorous data validation and quality control processes to enhance accuracy and ensure high-quality inputs; continuously monitored model performance, making adjustments based on real-time feedback and evaluation metrics.

•Effectively communicated research findings through reports and presentations, translating complex technical insights for faculty and stakeholders; collaborated closely with academic and technical teams to align the model with research objectives.

•Analyzed AI-powered chatbot conversation data using Python and statistical techniques, uncovering patterns in user interactions to improve response accuracy and optimize chatbot effectiveness. SBI General Insurance

Data Scientist Intern

Jun 2022 – Sep 2022

•Worked with customer policy data using SQL (PostgreSQL), ensuring accuracy and completeness for meaningful analysis. Designed Tableau dashboards to present policy renewal trends and customer demographics, making insights more accessible.

•Strengthened data integrity by implementing validation processes, reducing inconsistencies for better decision-making. Optimized data retrieval, cutting query time by 40%, which helped the team access critical information faster.

•Built and trained classification models (Decision Trees, Random Forest, XGBoost) to predict policy renewals with 70% accuracy. Conducted A/B testing on renewal communication strategies, leading to a 20% increase in customer engagement.

•Developed a fraud detection framework by analyzing patterns in claims data and applying anomaly detection techniques. Created reports and visualizations that guided strategic decisions to improve policy renewal rates. Infosys Ltd.

Senior System Engineer (Data Scientist)

Jul 2020 – Jul 2021

•Built a propensity model (F1-score: 69%) to predict customer responses across WhatsApp, IVR, and Messaging, helping refine targeted marketing strategies and improve engagement.

•Extracted and analyzed customer demographics and campaign data using Google Big Query, then designed automated retraining pipelines in GCP Dataflow to ensure models stayed up to date.

•Led EDA, A/B testing, and validation to enhance predictive models, resulting in a 3x increase in CTR and a 50% reduction in messaging costs by improving audience targeting.

•Partnered with marketing, engineering, and business teams to integrate predictive insights into decision-making while mentoring junior data scientists on model development and cloud tools.

•Analyzed AI-powered chatbot interactions to uncover patterns, applied predictive modeling to improve response accuracy, and enhanced customer engagement and satisfaction.

Infosys Ltd.

System Engineer

Oct 2018 – Jun 2020

•Designed and implemented an XML parser using Java Struts to automate document generation, converting data to JSON and reducing manual effort by 35%.

•Optimized SQL queries to enhance search and filter capabilities, improving data retrieval speed and overall database efficiency.

•Developed scalable data processing and system integration solutions to support growing business needs and ensure seamless operations.

•Collaborated with cross-functional teams to gather requirements, implement enhancements, and align technical solutions with business objectives.

•Conducted thorough testing, debugging, and performance tuning while documenting processes and providing support to team members and end-users.

Microland Ltd.

Software Engineer Intern

Jan 2017 – Jun 2017

•Assisted in gathering, cleaning, and analyzing data while developing and maintaining data models to enhance business insights.

•Contributed to ETL processes using SQL and Python, collaborated with senior engineers and data scientists, and documented key data workflows.

•Supported testing and validation of data processes, identified and resolved data quality issues, and provided assistance in maintaining data systems.

•Engaged in continuous learning through training programs, staying updated on data engineering trends, and actively contributing to coding, testing, and documentation efforts. PROJECTS

DataInsight Q&A

Python, Lang Chain, OpenAI LLM, FAISS, Streamlit

May 2024 – Jun 2024

•Designed a RAG for Question-Answering using Lang chain and OpenAI models, enabling context-aware responses.

•Engineered a CSV agent leveraging the Python REPL tool and FAISS vector database for embedding and retrieval operations. Geolocate

Python, Airflow, Tweepy, GCP, Looker Studio, NLTK

Feb 2023 – May 2023

•Conducted sentiment analysis on 250K+ tweets related to European football clubs using Twitter API (Tweepy).

•Orchestrated data preprocessing workflows to extract tweet data by setting up pipelines in Apache Airflow. EDUCATION

Master of Science in Data Science

Indiana University Bloomington

Aug 2021 – May 2023

Courses: Applied Machine Learning, Deep Learning, Data Visualization, Data Mining, Data Analysis Bachelor of Engineering in Computer Engineering

University of Mumbai

Aug 2014 – May 2018

Courses: Database Management Systems, Data Structures, Linear Algebra & Statistics, Analysis of Algorithms CERTIFICATES

Microsoft Certified Azure Data Scientist Associate TensorFlow Developer Certificate



Contact this candidate