Post Job Free
Sign in

Data Scientist Machine Learning

Location:
Plano, TX
Salary:
80000
Posted:
October 15, 2025

Contact this candidate

Resume:

APOORVA NANABOLU

Contact: +1-469-***-**** Email: *****************@*****.*** www.linkedin.com/in/apoorvarn-137256295

Aspiring Data Scientist & AI/ML Engineer Full-Stack Analytics Professional

Professional Summary:

Distinguished Computer Science graduate with expertise in enterprise-scale Machine Learning, Generative AI, and cloud-native data architectures across finance, healthcare, insurance, and retail. Proven record of $20M+ business impact through end-to-end MLOps/LLMOps, statistical modeling, feature engineering, production deployment, and real-time monitoring in AWS, Azure, and GCP. Advanced practitioner in AI technologies including Large Language Models, transformers, generative AI systems, LangChain-powered conversational AI, GraphRAG knowledge retrieval, and AI agent workflows. Skilled in designing scalable ETL/ELT pipelines, streaming systems, and 100TB+ data lake infrastructures with 99.9% reliability, with deep expertise in statistical analysis, causal inference, time series forecasting, and reinforcement learning, while also advancing Generative AI, LLMOps, and ethical machine learning practices to drive innovation.

Technology Stack:

Core Programming: Python, R, SQL

DS Foundations: Predictive modelling, clustering, classification

Databases: SQL, NoSQL, Snowflake

NLP/LLMs: GPT, BERT, Hugging Face, LangChain, LangGraph, LangSmith, CrewAI, AutoGen, POC on MCP, Crew AI, n8n & CursorAI

Vector DBs: Pinecone, Milvus, Faiss, Chroma

Gen AI: Prompt Engineering techniques (Chain-of-Thought, Guardrails, Few-shot etc.)

MLOps & Deployment: MLflow, Docker, Kubernetes, Airflow, Kedro, FastAPI

Cloud Platforms: AWS (SageMaker, Lambda, S3), Azure (OpenAI, AKS), GCP (Vertex AI)

Testing/Explainability: SHAP, LIME, Red Teaming, CI/CD

Visualization: Power BI, Tableau, Plotly

Security & Ethics: HIPAA, PII redaction, HITL systems, ethical ML practices

Version Control & Collaboration: Git, Bitbucket

Professional Experience:

PayPal Generative AI Engineer (Senior Data Scientist) Aug 2024 - Present

Tech Stack: Python, TensorFlow, PyTorch, AWS SageMaker, MLflow, Kubernetes, LangChain, Pinecone, Delta Lake, FastAPI, AWS Lambda, Grafana

Environment: TensorFlow, PyTorch, AWS SageMaker, Langchain

Key Achievements

Designed and deployed a large-scale Generative AI–driven fraud detection system using LLMs (GPT-4, Claude) with vector search, increasing detection accuracy by 40% across more than 10M daily financial transactions.

Built a streaming risk assessment framework leveraging transformers and Graph Neural Networks, cutting false positives by 55% while maintaining over 99% precision in high-risk detection.

Delivered an AI-powered customer service chatbot with RAG (LangChain + Pinecone) that processed 50K+ queries/day, achieving 92% resolution accuracy and saving $2M annually in support costs.

Enhanced credit risk modeling with ensemble and deep learning methods, boosting loan approval accuracy by 30% while meeting GDPR, PCI-DSS and banking compliance requirements.

Responsibilities & Contributions:

Implemented MLOps pipelines on AWS SageMaker with MLflow + Kubernetes, enabling automated retraining, CI/CD, and sub-200ms inference latency with 99.9% uptime.

Developed production-ready Generative AI APIs (FastAPI + AWS Lambda) handling 1M+ daily requests with auto-scaling and end-to-end monitoring via CloudWatch & Grafana.

Designed Delta Lake-based feature engineering workflows for structured + unstructured financial data, embedding automated data validation and exception handling.

Established model validation & explainability frameworks (A/B testing, SHAP, LIME, causal inference) ensuring statistical rigor and regulatory transparency.

United Healthcare Machine Learning Engineer (Data Scientist) July 2023 – Apr 2024

Tech Stack: Python, PyTorch, Azure ML, Kubernetes, Terraform, Azure Data Factory, BERT/Transformers, Power BI, Docker, React

Environment: PyTorch, Azure ML, Kubernetes, Terraform, Power BI

Key Achievements

Engineered an end-to-end patient outcome prediction system analyzing 500K+ patient records, achieving 87% accuracy in hospital readmission forecasting and lowering costs by 25% through proactive care strategies.

Built a real-time clinical decision support system leveraging computer vision models (CT, MRI, X-rays), achieving 94% diagnostic accuracy and reducing radiologist workload by 40%.

Designed scalable healthcare data pipelines (Azure Data Factory + Terraform) integrating EHR, PACS, and LIS systems, processing 100TB+ medical data monthly under strict HIPAA compliance.

Applied NLP with BERT and domain-specific transformers to 1M+ clinical notes, extracting actionable medical insights with a 91% F1-score in entity recognition and relationship extraction.

Developed an AI-driven resource allocation system using reinforcement learning, improving hospital bed utilization by 22% and cutting patient wait times by 35% across multiple facilities.

Responsibilities & Contributions:

Deployed production-grade ML services with Docker + Azure Kubernetes Service, implementing blue-green deployments and automated rollback for zero-downtime model updates.

Established a data governance & privacy framework, applying differential privacy and federated learning to enable secure multi-institutional collaborations.

Created interactive clinical dashboards (Power BI + React) to deliver real-time predictive insights to clinicians, enhancing evidence-based decision-making.

Built automated model monitoring and drift detection pipelines using Azure Monitor and custom Python frameworks, ensuring reliable performance and triggering retraining workflows as needed.

TATA AIG Insurance Data Scientist Jan 2021 – Jul 2022

Tech Stack: Python, XGBoost, TensorFlow, Vertex AI, BigQuery, Apache Kafka, Docker, Google Cloud Vision AI, Vertex AI Feature Store

Environment: XGBoost, Vertex AI, BigQuery, Apache Kafka, Docker

Key Achievements

Built an intelligent underwriting system leveraging gradient boosting & neural networks, processing 200K+ monthly insurance applications and improving risk assessment accuracy by 45%, cutting decision times from days to minutes.

Developed an AI-powered fraud detection platform using graph analytics & anomaly detection, identifying hidden fraud patterns with 89% precision, preventing $10M+ fraudulent payouts annually.

Designed a dynamic premium pricing engine with reinforcement learning, integrating competitor & customer behavior data to optimize rates—boosting policy retention by 18%.

Created customer lifetime value (CLV) prediction models using survival analysis & ML techniques, driving targeted campaigns and increasing cross-sell revenue by 28%.

Deployed a real-time claims processing system (Vertex AI + BigQuery), reducing settlement time by 60% while maintaining high accuracy in liability & damage assessment.

Responsibilities & Contributions:

Engineered a scalable event-driven data infrastructure (Google Cloud Dataflow + Pub/Sub), supporting 10K+ concurrent insurance transactions across structured/unstructured datasets.

Established an enterprise-grade feature store with Vertex AI, standardizing feature engineering pipelines and cutting model development time by 40%.

Designed a model governance framework using Vertex AI Pipelines, ensuring reproducible training, validation, and deployment with automated compliance & audit checks.

Built an intelligent document processing system with Google Cloud Vision AI + NLP models, achieving 95% accuracy in automated document extraction & validation.

Flipkart Associate Data Engineer (Analytics) Jan 2020 – Dec 2021

Tech Stack Python, Pandas, AWS (S3, Glue, Athena, Redshift, Lambda), Apache Airflow, Streamlit, AWS Amplify, SQL, Power BI

Environment: Scikit-learn, AWS Redshift, Streamlit, Apache Airflow

Key Achievements

Partnered with analytics teams to translate business requirements into scalable data pipelines, improving delivery speed of insights by 40%.

Migrated legacy reporting jobs into cloud-native ETL workflows (Airflow + AWS Glue), reducing manual intervention by 80%.

Built data quality validation framework with automated anomaly detection, ensuring clean datasets for 50+ analytics use cases.

Developed curated data marts in Redshift optimized for marketing & merchandising teams, cutting query execution times from minutes to seconds.

Contributed to infrastructure-as-code setup (Terraform + AWS) for new data pipelines, enabling repeatable deployments across environments.

Analytics Conducted exploratory and diagnostic analysis on customer purchase patterns, providing insights that directly informed e-commerce growth strategy.

SQL Optimized complex Redshift SQL queries with partitioning and indexing strategies, improving query efficiency by 35% across large datasets.

Designed interactive dashboards in Power BI, integrating pipeline outputs with KPIs to support executive decision-making.

Responsibilities & Contributions:

Designed and documented end-to-end ETL flows (raw staging curated layers) with lineage tracking and schema evolution handling.

Integrated real-time clickstream data into Redshift and Athena, supporting personalization and funnel analytics at scale.

Built monitoring dashboards with Streamlit to track pipeline health, SLA breaches, and data freshness in near real time.

Automated partitioning, indexing, and compression strategies in Redshift to lower storage costs and improve performance by 30%.

Supported cross-functional data governance efforts by standardizing metadata, cataloging datasets, and enforcing role-based access policies.

Core Competencies & Leadership Traits:

Analytical Thinking & Problem Solving: Exceptional ability to break down complex business challenges into actionable data science solutions.

Strategic Vision & Innovation: Forward-thinking approach to identifying emerging trends and implementing cutting-edge technologies.

Collaborative Leadership: Proven skills in leading cross-functional teams and fostering inclusive, high-performance work environments.

Adaptability & Learning Agility: Rapid acquisition of new technologies and methodologies with demonstrated flexibility in dynamic business environments.

Communication Excellence: Strong ability to translate technical concepts into business insights for diverse stakeholder audiences.

Results-Driven Execution: Consistent track record of delivering measurable business value through data-driven initiatives and optimization strategies.

Ethical Decision Making: Commitment to responsible AI practices, data privacy, and ethical considerations in machine learning implementations.

Continuous Improvement Mindset: Proactive approach to process optimization, knowledge sharing, and professional development initiatives.

Resilience & Composure: Ability to maintain high performance and positive attitude under pressure while managing multiple competing priorities.

Certifications:

Certified and earned a badge in 'AWS Certified Solution Architect – Associate' from Amazon Web Services (AWS).

Education:

University of New Haven, Connecticut, USA

Master of Science, Computer Science

Bhavans Vivekananda College

Bachelors in BSc MSCs



Contact this candidate