Data Scientist Machine Learning

Location:

Posted:

October 15, 2025

Contact this candidate

Resume:

PRASHANTHI NAGAVARAM

Email: *********************@*****.*** Phone Number: +1-314-***-**** LinkedIn

Generative AI Engineer Machine Learning Engineer Cloud & MLOps Specialist Data Scientist

PROFESSIONAL SUMMARY

Generative AI Engineer and Data Scientist focused on changing how organizations think, work, and grow by turning data into useful, high-quality solutions for financial services, healthcare, and retail. My career began with an interest in how decisions shape industries. This interest led me to concentrate on machine learning and Generative AI as tools to solve complex problems and deliver clear business results. I have experience in building scalable ML pipelines, deploying LLM-based solutions, and creating responsible and transparent AI. I help teams automate business processes, improve risk management, and encourage compliant use in regulated fields. I believe successful AI leadership connects innovative technology with real-world application. This approach helps organizations move towards a smarter, ethical, and sustainable future.

Specialized in Generative AI, LLM-based architectures, and RAG pipelines, with hands-on expertise in LangChain, Hugging Face, Neo4j GraphRAG, and GPT APIs for context-aware knowledge retrieval and automation.

Proven experience in MLOps and AI deployment, building reproducible, scalable pipelines using MLflow, Docker, Kubernetes (AKS), Airflow, and SageMaker across AWS, Azure, and GCP cloud ecosystems.

Expert in predictive modeling, deep learning, and compliance analytics, leveraging tools like TensorFlow, PyTorch, XGBoost, and Scikit-learn to deliver measurable business value in regulated environments.

Adept at explainable and ethical AI development, applying SHAP, LIME, RAGAS, BLEU, and ROUGE for transparent evaluation, bias detection, and responsible AI governance.

Skilled in data engineering and visualization, integrating SQL, Snowflake, BigQuery, Tableau, Power BI, and Streamlit to transform large-scale data into actionable insights.

Recognized for cross-functional collaboration and technical leadership, aligning AI innovations with business strategy and ensuring compliance with HIPAA, GDPR, and AML frameworks.

TECHNICAL EXPERTISE

Programming & Data Science Foundations

Languages: Python, R, SQL

Techniques: Predictive Modelling, Clustering, Classification, Regression, Forecasting

Databases: SQL, NoSQL, PostgreSQL, Snowflake, Redis, Elasticsearch

AI & Gen AI Frameworks

NLP / LLMs: GPT (OpenAI, Azure OpenAI), BERT, Hugging Face Transformers, RAG, GraphRAG, LangChain, LangGraph, LangSmith, CrewAI, AutoGen

Vector Databases: Pinecone, Milvus, Faiss, Chroma, VectorSearch

Prompt Engineering: Chain-of-Thought, Guardrails, Few-shot prompting, Context-aware retrieval

AI Governance & Evaluation: RAGAS, BLEU, ROUGE, Fairlearn, AUC-ROC

Engineering & Deployment

MLOps Frameworks: MLflow, Docker, Kubernetes (AKS), Apache Airflow, Kedro, FastAPI, Azure ML, Azure Functions, SageMaker, GCP Dataflow

Deep Learning Frameworks: TensorFlow, PyTorch, Keras, CNNs, RNNs, Transformers

Data Science & Analysis Libraries: Pandas, NumPy, PySpark, Matplotlib, Scikit-learn, SciPy, Prophet, ARIMA, SARIMA

Cloud Platforms: AWS (SageMaker, Lambda, S3), Azure (OpenAI, AKS), GCP (Vertex AI, BigQuery)

Testing & Explainability: SHAP, LIME, PDP, Red Teaming, CI/CD (GitHub Actions, Azure DevOps)

Supporting Skills

Visualization & BI Tools: Tableau, Power BI, Plotly, Streamlit, Seaborn, Matplotlib

Security & Compliance: HIPAA, GDPR, AML, HITL systems, PII/PHI Redaction, Data Anonymization, Adversarial Testing

Development Practices: Agile Development, Test-Driven Development (TDD)

Version Control: Git, Bitbucket, TFS

PROFESSIONAL EXPERIENCE

Generative AI Engineer Huntington National Bank

Banking & Financial Services Aug 2024 – Present

Domain: Compliance Analytics & Generative AI Solutions Environment: Python, R, SQL, LangChain, LangGraph, Hugging Face Transformers, GPT APIs, GraphRAG, Neo4j, Faiss, PyTorch, Scikit-learn, MLflow, Docker, FastAPI, Azure Kubernetes Service (AKS), Apache Airflow, AWS SageMaker, Azure OpenAI, Snowflake, Tableau, Power BI, RAGAS, BLEU, ROUGE

Project Overview:

Designed and deployed a GenAI-powered financial intelligence platform for a leading banking client, combining LLM applications, machine learning workflows, and scalable data pipelines to enhance customer experience, fraud detection, and regulatory compliance.

Key Contributions & Deliverables:

Engineered LLM-driven document intelligence system and embedding pipelines using GPT, LangChain, and Hugging Face Transformers, enabling automated parsing of loan applications, KYC forms, and compliance reports, reducing manual review time by 60%.

Built retrieval-augmented generation RAG pipeline with Faiss and Neo4j GraphRAG, integrating structured like SQL, Snowflake and unstructured data like PDFs, emails for context-aware financial insights.

Applied data science techniques classification, clustering, anomaly detection with Scikit-learn & PyTorch to identify fraudulent patterns in transactions, improving fraud detection accuracy with explainability via SHAP/LIME

Developed data engineering workflows with Apache Airflow and ETL pipelines, ingesting and transforming 500GB+ daily transaction data into a Snowflake-based data lake house, ensuring data availability for AI/ML models.

Established MLOps practices with MLflow, Docker, and AWS SageMaker, enabling reproducible model training, CI/CD integration, and deployment at scale.

Designed interactive BI dashboards in Tableau and Power BI, providing real-time insights on loan defaults, fraud trends, and customer risk profiling for business teams.

Integrated vector databases such as Pinecone and Milvus within RAG pipelines to enable high-speed semantic search and embedding-based retrieval across financial documents.

Implemented security guardrails for AI models like adversarial prompt testing, PII redaction, compliance with GDPR/AML/HIPAA frameworks ensuring responsible GenAI adoption in regulated banking environment.

Spearheaded the deployment of compliance-aware LLM microservices using FastAPI and Azure Kubernetes Service (AKS), creating secure API access to GenAI models across enterprise systems.

Introduced retrieval-evaluation metrics such as RAGAS, BLEU, and ROUGE to measure GenAI response quality and contextual accuracy, improving reliability and trust in AI-generated insights for financial advisors.

Built a human-in-the-loop (HITL) validation workflow that enabled compliance teams to review, annotate, and approve AI outputs, strengthening accountability and audit readiness across all GenAI deployments.

Machine Learning Engineer Elevance Health

Insurance& Risk Analytics Aug 2021 – July 2023

Domain: Predictive Modelling & Deep learning for Insurance Operations Environment: Python, R, SQL, Pandas, NumPy, Scikit-learn, XGBoost, LightGBM, TensorFlow, PyTorch, MLflow, Docker, Kubernetes, FastAPI, Azure Machine Learning, Azure Functions, Azure Databricks, Snowflake, Tableau, Streamlit, SHAP, LIME, PDP

Project Overview:

Developed and deployed machine learning and deep learning solutions to modernize risk assessment, claims processing, and customer retention strategies for a global insurance provider. The project blended predictive modelling, data pipelines, and advanced analytics for scalable, production-grade outcomes.

Key Contributions & Deliverables:

Designed predictive risk scoring models using XGBoost, LightGBM, and Scikit-learn, improving underwriting efficiency by accurately modelling claim probability and policyholder risk factors.

Implemented deep learning models with TensorFlow and PyTorch - CNNs, RNNs for automated claims document classification and fraud pattern detection in text/image data.

Conducted data analysis & feature engineering on 20M+ policyholder records using Python libraries Pandas, NumPy and SQL, extracting behavioural and transactional features that improved model accuracy.

Established MLOps framework with MLflow, Docker, and Azure ML, automating model training, versioning, and deployment into production.

Applied explainable AI techniques SHAP, LIME, PDP to ensure transparency in risk predictions and compliance with regulatory frameworks.

Implemented Pinecone-based vector storage to manage embeddings generated from claims data and clinical notes, enabling efficient similarity search and knowledge discovery.

Built embedding generation and update workflows using TensorFlow and BERT models to continuously enrich feature stores with semantic context.

Created interactive dashboards in Tableau and Streamlit for underwriting teams, providing real-time insights on claim likelihood, fraud risks, and customer churn analysis.

Collaborated with cross-functional teams to establish frameworks and best practices that support future AI and ML initiatives.

Leveraged containerization technologies like Docker and orchestration tools like Kubernetes for deploying machine learning applications in cloud environments.

Implemented policyholder churn prediction APIs with FastAPI and Azure Functions, integrating predictive insights directly into CRM and retention workflows.

Implemented the transition from rule-based systems to probabilistic ML models, reducing manual risk classification errors and enhancing claims automation accuracy across business units.

Data Scientist UnitedHealth Group

Healthcare Analytics Aug 2020 – July 2021

Domain: Clinical Data Science & Predictive Healthcare insights Environment: Python, R, SQL, PySpark, Pandas, NumPy, Scikit-learn, XGBoost, LightGBM, Prophet, ARIMA, SARIMA, Apache Airflow, GCP BigQuery, GCP Dataflow, Docker, MLflow, Tableau, Power BI, Matplotlib, FHIR, HIPAA Compliance

Project Overview:

Built a healthcare analytics and decision-support platform that combined clinical data pipelines, statistical modelling, and predictive analytics to optimize patient care, hospital resource allocation, and compliance reporting.

Key Contributions & Deliverables:

Engineered ETL pipelines with PySpark, Apache Airflow, SQL, GCP BigQuery, to integrate 10M+ patient records from EHR systems, IoT devices, and lab systems for both Structured and unstructured data like PDF, Images, ensuring secure and HIPAA-compliant data flow.

Designed predictive models using Scikit-learn, XGBoost, and LightGBM for early disease detection (diabetes, cardiovascular risk) and hospital readmission prediction.

Applied time-series forecasting tools Prophet, ARIMA, SARIMA to model patient admission rates, enabling hospitals to optimize bed capacity and staffing schedules.

Built risk stratification pipelines that segmented patients based on demographic, lifestyle, and medical history, improving targeted care management strategies.

Combined patient data from multiple hospital systems using SQL and FHIR standards, creating a unified dataset for analysis and reporting.

Applied data anonymization and encryption techniques in Python to ensure patient privacy and maintain HIPAA compliance.

Created chronic disease forecasting models with time-series algorithms to predict long-term health trends.

Used statistical analysis and visualization libraries like NumPy and Matplotlib to study treatment outcomes and medication effectiveness.

Delivered interactive dashboards for clinicians and administrators, providing insights into patient outcomes, operational KPIs, and compliance monitoring.

Associate Data Scientist (Analyst) Flipkart

Retail & E-Commerce Analytics Jan 2020 – June 2020

Domain: Customer Behavior Modeling & Demand Forecasting Environment: Python, SQL, Pandas, NumPy, Scikit-learn, XGBoost, LightGBM, CatBoost, TensorFlow, PyTorch, Prophet, Streamlit, Matplotlib, Seaborn, Plotly, Snowflake, AWS S3, Apache Airflow

Project Overview:

Developed a retail customer analytics and demand optimization platform that leveraged Python-based machine learning models, data pipelines, and visualization to improve customer retention, optimize pricing, and streamline supply chain operations.

Key Contributions & Deliverables:

Designed ETL workflows with Python, Airflow, and Snowflake to ingest and process 5M+ daily sales transactions, customer interactions, and marketing campaign data across multiple retail channels.

Built customer segmentation models like K-Means, DBSCAN, hierarchical clustering to identify high-value customer groups, enabling personalized promotions and improving campaign ROI by 30%.

Developed recommendation engines using collaborative filtering and deep learning tools TensorFlow, PyTorch, which boosted cross-sell and upsell conversions by 18%.

Applied classification models like XGBoost, LightGBM, CatBoost to predict customer churn and designed retention strategies, leading to a 20% reduction in attrition.

Created interactive dashboards with Streamlit, Matplotlib, and Seaborn to visualize sales trends, campaign performance, and customer engagement in real time.

Conducted feature engineering and A/B testing to identify impactful marketing factors and improve model precision across promotions and pricing strategies.

Partnered with supply chain teams to forecast product demand and restock cycles using Prophet and time-series regression, reducing stockouts and improving delivery efficiency.

Implemented automated data validation scripts in Python to monitor data quality and detect missing or inconsistent values across product catalogs and transactional feeds, improving model input reliability.

Automated pricing optimization pipelines using elasticity modelling and regression analysis, dynamically adjusting prices and improving margins by 10%.

EDUCATION

Master of Science Computer Information Systems & Analysis University of central Missouri 2023- 2025

Bachelors in Engineering Osmania University 2017 - 2021

CERTIFICATIONS

Microsoft Certified: Azure AI Engineer Associate

Microsoft Certified: Azure Data Fundamentals (DP- 900)

Microsoft Certified: Azure Fundamentals (AZ-900)

CORE COMPETENCIES & LEADERSHIP STRENGTHS

Analytical Thinking & Problem Solving: Skilled at breaking complex data and AI challenges into actionable insights using statistical reasoning and data-driven decision frameworks.

Strategic Vision & Innovation: Forward-thinking in adopting emerging AI technologies like LLMs, RAG systems, and multimodal analytics to drive business transformation.

Collaborative Leadership: Proven ability to lead cross-functional teams of data scientists, engineers, and business analysts in delivering high-impact AI solutions.

Adaptability & Learning Agility: Quick to master new frameworks, cloud platforms, and AI methodologies in fast-paced enterprise environments.

Communication Excellence: Adept at translating complex ML and GenAI concepts into clear business value for stakeholders and executives.

Results-Driven Execution: Demonstrated success delivering measurable outcomes through AI-driven automation, predictive analytics, and optimization.

Ethical Decision Making: Committed to responsible AI development, ensuring fairness, privacy, and compliance with HIPAA, GDPR, and AML standards.

Continuous Improvement Mindset: Dedicated to refining processes, enhancing model efficiency, and integrating best practices in MLOps and deployment.

Resilience & Composure: Maintain high performance and precision under pressure, managing competing priorities in data-intensive environments.

ACADEMIC & RESEARCH PROJECTS

Deep Learning for Sentiment Analysis and Text Generation: https://github.com/Prashanthi0205/SentimentAnalysis-

Collected and pre-processed large textual datasets using tokenization, padding, and embedding techniques. Designed 1D CNN models and fine-tuned Transformer architectures (BERT, GPT) to achieve a 93% F1-score in sentiment classification and text generation tasks. Used RNN and LSTM models for capturing long-term dependencies in text sequences.

Tools & Technologies : Python, TensorFlow, PyTorch, Keras, Word2Vec, GloVe, CNN, RNN, LSTM, BERT, GPT, Transformers.

Brain Tumor Detection and Segmentation using Deep Learning : https://github.com/Prashanthi0205/BrainTumorDetection

Collected and pre-processed MRI brain scans applying skull stripping, normalization, and augmentation. Built CNN classifiers and used transfer learning to improve tumor detection accuracy. Achieved 96% classification accuracy and high Dice coefficient for segmentation, outperforming traditional methods.

Tools & Technologies : Python, TensorFlow, Keras, CNN, transfer learning, medical imaging preprocessing, data augmentation.

Credit Card Fraud Detection System:

https://github.com/Prashanthi0205/FraudDetection

Designed an end-to-end fraud detection pipeline for a highly imbalanced credit card transaction dataset. Implemented data cleaning, preprocessing, and feature engineering including missing value imputation, scaling, and PCA. Built stacked ensemble models that improved F1-score by 94% and AUC-ROC by 96% over baselines. Applied model explainability techniques for interpretability in financial decision-making.

Tools & Technologies : Python, Logistic Regression, SVM, Random Forest, XGBoost, LightGBM, Neural Networks, PCA, SHAP, LIME, ensemble learning.

Medical Chatbot with PDF Handling using LangChain :

https://github.com/Prashanthi0205/MedicalChatbot

Developed a medical chatbot leveraging large language models and LangChain to answer complex queries. Implemented a PDF ingestion pipeline for extracting and embedding medical documents to enable context-aware responses. Used retrieval-augmented generation for reliable, source-grounded medical answers with an interactive user interface.

Tools & Technologies : Python, LangChain, large language models, PDF processing, Retrieval-Augmented Generation (RAG), chatbot development.

Contact this candidate