SaiReddy Thatiparthi
Data Scientist
Email:**************************@*****.***
Phone: +1-412-***-****
PROFESSIONAL SUMMARY:
Data Scientist and Mentor with more than 5+ years of working experience in the field of python, Machine Learning, Deep Learning, MLOps, Data Visualization, Big-data Technologies, Generative AI, LLM, RAG, Prompt Engineering
Hands- on experience on Classification, Regression, Clustering, Computer vision, Natural language processing and Transfer learning models to solve challenging business problems
Expertise in SQL, MySQL, TensorFlow, Scikit-learn, Keras, PyTorch, genism, NLTK, Spacy, Hugging Face, Open AI, LangChain, Pandas, Numpy, Matplotlib, Seaborn, Plotly, Tableau and PowerBI for delivering data-driven solution
Proficient in implementing and optimizing Vector Databases like Pinecone and ChromaDB for efficient similarity search, RAG and large scale AI applications
Experienced in developing, fine-tuning, and deploying Large Language Models (LLMs) such as GPT, BERT, T5, Dolly, Dolly-E, Gemini, and LLaMA for advanced NLP applications and AI-driven solutions
Experienced in Big Data technologies such as Apache Spark, Hadoop, HDFS and Databricks for scalable data processing, distributed computing and efficient model training on large datasets
Skilled in cloud-based complex ML/AI models developing and development conducting in depth Data analysis and translating insights into actionable business strategies using AWS SageMaker, Bedrock, GCP Vertex AI, Azure, Docker, Jenkins and Kubernetes
Skilled in MLOps frameworks and tools for model lifecycle management, including CI/CD pipelines, monitoring and deployment using MLflow, Kubeflow, Airflow
Proficient in building conversational AI chatbot solutions using Rasa and Dialogflow
EDUCATIONAL DETAILS:
Trine University, Angola, IN, USA Dec 2023
Master of science in Information Technology
TECHNICAL SKILL SETS:
Programming Languages
Python, Java, SQL, MySQL
Vector Databases
Pinecone, ChromaDB
Hadoop Ecosystem
Hadoop, HDFS, MapReduce, Hive, Impala, Sqoop,
Kafka, Spark core, Spark streaming, Snowflake,
Web Frameworks
Django,
Data Visualization Libraries
Pandas, NumPy, Matplotlib, Seaborn, PowerBI, Plotly, Tableau
Frameworks
TensorFlow, Scikit-learn, Keras, PyTorch, Open AI, LangChain
CI/CD Tools
Jenkins, GIT
Cloud Technologies
AWS S3, EC2, RDS, SageMaker, Bedrock, Lambda, GCP Vertex AI
AI/ML Models
Regression, Classification, Clustering, Deep learning, Transfer Learning models, Generative AI, LLM, RAG, Prompt Engineering
LLM Models
GPT, BERT,T5, Gemini, LLaMA, Dolly, Dolly-E
Natural Language Processing
SpaCy, NLTK, RASA, Dialogflow, HuggingFace
Containers
Docker and Kubernetes
Monitoring and Automation
MLflow, Airflow, Kubeflow
CERTIFICATE:
Python Life Certified Full stack Data Science September-2017
Verify: https://pythonlife.in/data-science.html
Machine Learning Practical Workout with Real-world Projects 2024
Verify: https://ude.my/UC-5de09420-44d9-4452-a970-6659d44b97b
Generative AI– Natural Language Processing(NLP) Bootcamp2024
Verify: http://ude.my/UC-878914ec-dca1-4fdd-9e67-a2d5f810d4ba
Artificial Intelligence Build LLM & ChatGPT 2024
Verify: http://ude.my/UC-639310cb-4648-4996-8bb0-dfb3192bcbb1
CERTIFICATION:
AWS Certified Machine Learning Engineer – Associate Dec 06, 2024 – Dec 06, 2027
Verify: https://www.credly.com/users/saireddy-thatiparthi
AWS Certified Machine Learning – Specialty Dec 28, 2024 – Dec 28, 2027
Verify: https://www.credly.com/users/saireddy-thatiparthi
PROFESSIONAL EXPERIENCE:
American Airlines – Dallas, Texas January 2024 - Present
Data Scientist
Responsibilities:
Built and deployed ML models using AWS SageMaker achieving 95% predictions
Designed and implemented Spark-based real-time data pipelines, reducing data processing time by 30%
Developed a Machine learning model to predict flight delays using factors such as weather, Airport traffic and historical flight performance
Applied Classification, Regression models, Prophet, ARIMA, LSTMs to improve flight demand predictions, future ticket sales and optimize scheduling
Implemented a Sentiment analysis tool using NLP to analyze customer feedback, reviews and complaints
Worked with Big data platforms Databricks, Snowflake to analyze large-scale flight and operational data
Designed and deployed robust Machine Learning and AI models on GCP
Created a dashboards for customer success team to track churn risk and take proactive actions
Implemented A/B testing for feature engineering and improved model performance by 20%
Optimized data analysis workflows, reducing decision-making time by 25% for the client
Designed and deployed AI-driven Chatbots to handle customer queries, reducing response time and improving satisfaction
Automated CI/CD pipelines for model deployment using MLflow reducing manual intervention by 50%
Design and implement text-to-image generation pipelines using Hugging Face’s Diffusers and LLM for generating realistic visuals based on text prompts
Developed a recommendation engine using collaborative filtering, improving user engagement by 25%
Designed LLM-based solutions for automation in airline regulatory reporting and document processing
Designed real-time anomaly detection systems for airline maintenance and operational efficiency
Gathered and organized large and complex data assets, perform relevant analyses and modelling
Collaborated with cross-functional teams to develop AI-powered solutions using AWS Bedrock driving actionable business insights
Environments: Hive, Sqoop, Storm, Kafka, HDFS, AWS SageMaker, EC2, S3, Bedrock, YARN, MapReduce, RDBMS, Databricks, Dynamo DB, MongoDB, Snowflake, NLTK, genism, SpaCy, Scikit-learn, Tensorflow, PyTorch, MLflow, Docker, Kubernetes, Keras, Matplotlib, Seaborn, Plotly, RASA, Dilogflow, Hugging Face Transformers, GPT, BERT, Google Could Platform, Classification, Regression, Clusters, Kubeflow, LLaMA Index
Infosys
Data Scientist Dec2020-August2022
Responsibilities:
Extracted and processed large volumes of Structured and Unstructured data from diverse source, ensuring data equality and integrity for model training
Optimized models using Random Forest, Decision Tress and XGBoost, LightGBM achieving 85% accuracy
Implement distributed Machine learning models for large-scale data processing
Created interactive dashboards and visualizations to communicate complex data insights to stakeholders, informed decision-making
Develop Statistical and Predictive models to forecast business metrics
Implement Time-series forecasting models for trend analysis
Perform data wrangling, Feature Engineering and data augmentation to improve model accuracy
Bulit scalable big data pipelines using Spark & Hadoop, reducing ETL processing time
Developed machine learning pipelines using Kubeflow, enabling seamless integration between training and deployment
Implemented end-to-end ML pipelines, including data preprocessing, model training, and deployment using GCP tools, ensuring high performance and reliability in production environments
Automated model retraining pipelines using Airflow, ensuring model accuracy over time
Converted legacy MapReduce pipelines to Spark transformations, improving data processing efficiency by 30%
Develop NLP models for text analytics, sentiment analysis, document classification and Chatbot development
Prepared and delivered data-driven presentations to senior leadership, aiding strategic decision-making
Proven ability to lead and collaborate with diverse teams, fostering a culture of innovation and shared success.
Environment: Python, Spark, Hadoop, Kafka, HIVE, MatplotLib, Tableau, Scikit-Learn, MySQL, SQL, MLflow, AWS SageMaker, Scikit-learn, NLTK, Spacy, TensorFlow, Jenkins, Google Could Platform
Hexaware Jan 2019-Dec 2020
Data Scientist
Responsibilities:
Performed exploratory data analysis (EDA) and statistical testing to understand data trends and provide actionable business insights
Implement Supervised and Unsupervised Machine learning models
Work with large datasets using SQL,MYSQL, Pandas and PySpark
Dveloped and implemented innovative feature engineering techniques to enhance model performance and generate actionable insights
Clean, preprocess and analyze Structured and Unstructured data for ML applications
Implement basic CI/CD pipelines for Machine learning models
Deploy models using Docker, Kubernetes
Automated reporting pipelines, boosting team efficiency by 30%
Collaborated with the team to designed new statistical analysis model
Created reports and dashboards using Power BI, Tableau and Matplotlib
Support big data processing using Apache Spark, Hadoop, AWS, GCP
Designed MLflow pipelines for model tracking and monitoring.
Responsible for handling requests from the marketing department and inquiries from the Customer Services related to statistics of existing data
Experienced in collaborating with cross-functional teams between data science, business, and Data Engineering teams
Actively contributed to team knowledge-sharing initiatives, mentoring junior members in machine learning and data visualization techniques
Environment: Python, TensorFlow, Scikit-Learn, SQL, MySQL, AWS SageMaker, S3 NumPy, Pandas, Matplotlib, Machine learning, MLflow, Docker, Kubernetes, Airflow, Apache Spark, Hadoop, AWS, GCP, PowerBI, Tableau