Data Scientist Machine Learning

Location:

Rockwall, TX, 75032

Posted:

February 23, 2025

Contact this candidate

Resume:

SaiReddy Thatiparthi

Data Scientist

Email:**************************@*****.***

Phone: +1-412-***-****

PROFESSIONAL SUMMARY:

Data Scientist and Mentor with more than 5+ years of working experience in the field of python, Machine Learning, Deep Learning, MLOps, Data Visualization, Big-data Technologies, Generative AI, LLM, RAG, Prompt Engineering

Hands- on experience on Classification, Regression, Clustering, Computer vision, Natural language processing and Transfer learning models to solve challenging business problems

Expertise in SQL, MySQL, TensorFlow, Scikit-learn, Keras, PyTorch, genism, NLTK, Spacy, Hugging Face, Open AI, LangChain, Pandas, Numpy, Matplotlib, Seaborn, Plotly, Tableau and PowerBI for delivering data-driven solution

Proficient in implementing and optimizing Vector Databases like Pinecone and ChromaDB for efficient similarity search, RAG and large scale AI applications

Experienced in developing, fine-tuning, and deploying Large Language Models (LLMs) such as GPT, BERT, T5, Dolly, Dolly-E, Gemini, and LLaMA for advanced NLP applications and AI-driven solutions

Experienced in Big Data technologies such as Apache Spark, Hadoop, HDFS and Databricks for scalable data processing, distributed computing and efficient model training on large datasets

Skilled in cloud-based complex ML/AI models developing and development conducting in depth Data analysis and translating insights into actionable business strategies using AWS SageMaker, Bedrock, GCP Vertex AI, Azure, Docker, Jenkins and Kubernetes

Skilled in MLOps frameworks and tools for model lifecycle management, including CI/CD pipelines, monitoring and deployment using MLflow, Kubeflow, Airflow

Proficient in building conversational AI chatbot solutions using Rasa and Dialogflow

EDUCATIONAL DETAILS:

Trine University, Angola, IN, USA Dec 2023

Master of science in Information Technology

TECHNICAL SKILL SETS:

Programming Languages

Python, Java, SQL, MySQL

Vector Databases

Pinecone, ChromaDB

Hadoop Ecosystem

Hadoop, HDFS, MapReduce, Hive, Impala, Sqoop,

Kafka, Spark core, Spark streaming, Snowflake,

Web Frameworks

Django,

Data Visualization Libraries

Pandas, NumPy, Matplotlib, Seaborn, PowerBI, Plotly, Tableau

Frameworks

TensorFlow, Scikit-learn, Keras, PyTorch, Open AI, LangChain

CI/CD Tools

Jenkins, GIT

Cloud Technologies

AWS S3, EC2, RDS, SageMaker, Bedrock, Lambda, GCP Vertex AI

AI/ML Models

Regression, Classification, Clustering, Deep learning, Transfer Learning models, Generative AI, LLM, RAG, Prompt Engineering

LLM Models

GPT, BERT,T5, Gemini, LLaMA, Dolly, Dolly-E

Natural Language Processing

SpaCy, NLTK, RASA, Dialogflow, HuggingFace

Containers

Docker and Kubernetes

Monitoring and Automation

MLflow, Airflow, Kubeflow

CERTIFICATE:

Python Life Certified Full stack Data Science September-2017

Verify: https://pythonlife.in/data-science.html

Machine Learning Practical Workout with Real-world Projects 2024

Verify: https://ude.my/UC-5de09420-44d9-4452-a970-6659d44b97b

Generative AI– Natural Language Processing(NLP) Bootcamp2024

Verify: http://ude.my/UC-878914ec-dca1-4fdd-9e67-a2d5f810d4ba

Artificial Intelligence Build LLM & ChatGPT 2024

Verify: http://ude.my/UC-639310cb-4648-4996-8bb0-dfb3192bcbb1

CERTIFICATION:

AWS Certified Machine Learning Engineer – Associate Dec 06, 2024 – Dec 06, 2027

Verify: https://www.credly.com/users/saireddy-thatiparthi

AWS Certified Machine Learning – Specialty Dec 28, 2024 – Dec 28, 2027

Verify: https://www.credly.com/users/saireddy-thatiparthi

PROFESSIONAL EXPERIENCE:

American Airlines – Dallas, Texas January 2024 - Present

Data Scientist

Responsibilities:

Built and deployed ML models using AWS SageMaker achieving 95% predictions

Designed and implemented Spark-based real-time data pipelines, reducing data processing time by 30%

Developed a Machine learning model to predict flight delays using factors such as weather, Airport traffic and historical flight performance

Applied Classification, Regression models, Prophet, ARIMA, LSTMs to improve flight demand predictions, future ticket sales and optimize scheduling

Implemented a Sentiment analysis tool using NLP to analyze customer feedback, reviews and complaints

Worked with Big data platforms Databricks, Snowflake to analyze large-scale flight and operational data

Designed and deployed robust Machine Learning and AI models on GCP

Created a dashboards for customer success team to track churn risk and take proactive actions

Implemented A/B testing for feature engineering and improved model performance by 20%

Optimized data analysis workflows, reducing decision-making time by 25% for the client

Designed and deployed AI-driven Chatbots to handle customer queries, reducing response time and improving satisfaction

Automated CI/CD pipelines for model deployment using MLflow reducing manual intervention by 50%

Design and implement text-to-image generation pipelines using Hugging Face’s Diffusers and LLM for generating realistic visuals based on text prompts

Developed a recommendation engine using collaborative filtering, improving user engagement by 25%

Designed LLM-based solutions for automation in airline regulatory reporting and document processing

Designed real-time anomaly detection systems for airline maintenance and operational efficiency

Gathered and organized large and complex data assets, perform relevant analyses and modelling

Collaborated with cross-functional teams to develop AI-powered solutions using AWS Bedrock driving actionable business insights

Environments: Hive, Sqoop, Storm, Kafka, HDFS, AWS SageMaker, EC2, S3, Bedrock, YARN, MapReduce, RDBMS, Databricks, Dynamo DB, MongoDB, Snowflake, NLTK, genism, SpaCy, Scikit-learn, Tensorflow, PyTorch, MLflow, Docker, Kubernetes, Keras, Matplotlib, Seaborn, Plotly, RASA, Dilogflow, Hugging Face Transformers, GPT, BERT, Google Could Platform, Classification, Regression, Clusters, Kubeflow, LLaMA Index

Infosys

Data Scientist Dec2020-August2022

Responsibilities:

Extracted and processed large volumes of Structured and Unstructured data from diverse source, ensuring data equality and integrity for model training

Optimized models using Random Forest, Decision Tress and XGBoost, LightGBM achieving 85% accuracy

Implement distributed Machine learning models for large-scale data processing

Created interactive dashboards and visualizations to communicate complex data insights to stakeholders, informed decision-making

Develop Statistical and Predictive models to forecast business metrics

Implement Time-series forecasting models for trend analysis

Perform data wrangling, Feature Engineering and data augmentation to improve model accuracy

Bulit scalable big data pipelines using Spark & Hadoop, reducing ETL processing time

Developed machine learning pipelines using Kubeflow, enabling seamless integration between training and deployment

Implemented end-to-end ML pipelines, including data preprocessing, model training, and deployment using GCP tools, ensuring high performance and reliability in production environments

Automated model retraining pipelines using Airflow, ensuring model accuracy over time

Converted legacy MapReduce pipelines to Spark transformations, improving data processing efficiency by 30%

Develop NLP models for text analytics, sentiment analysis, document classification and Chatbot development

Prepared and delivered data-driven presentations to senior leadership, aiding strategic decision-making

Proven ability to lead and collaborate with diverse teams, fostering a culture of innovation and shared success.

Environment: Python, Spark, Hadoop, Kafka, HIVE, MatplotLib, Tableau, Scikit-Learn, MySQL, SQL, MLflow, AWS SageMaker, Scikit-learn, NLTK, Spacy, TensorFlow, Jenkins, Google Could Platform

Hexaware Jan 2019-Dec 2020

Data Scientist

Responsibilities:

Performed exploratory data analysis (EDA) and statistical testing to understand data trends and provide actionable business insights

Implement Supervised and Unsupervised Machine learning models

Work with large datasets using SQL,MYSQL, Pandas and PySpark

Dveloped and implemented innovative feature engineering techniques to enhance model performance and generate actionable insights

Clean, preprocess and analyze Structured and Unstructured data for ML applications

Implement basic CI/CD pipelines for Machine learning models

Deploy models using Docker, Kubernetes

Automated reporting pipelines, boosting team efficiency by 30%

Collaborated with the team to designed new statistical analysis model

Created reports and dashboards using Power BI, Tableau and Matplotlib

Support big data processing using Apache Spark, Hadoop, AWS, GCP

Designed MLflow pipelines for model tracking and monitoring.

Responsible for handling requests from the marketing department and inquiries from the Customer Services related to statistics of existing data

Experienced in collaborating with cross-functional teams between data science, business, and Data Engineering teams

Actively contributed to team knowledge-sharing initiatives, mentoring junior members in machine learning and data visualization techniques

Environment: Python, TensorFlow, Scikit-Learn, SQL, MySQL, AWS SageMaker, S3 NumPy, Pandas, Matplotlib, Machine learning, MLflow, Docker, Kubernetes, Airflow, Apache Spark, Hadoop, AWS, GCP, PowerBI, Tableau

Contact this candidate