Post Job Free
Sign in

Machine Learning Data Science

Location:
Plano, TX
Salary:
90000
Posted:
October 15, 2025

Contact this candidate

Resume:

Name: Neha Nadiminti Email: **************@*****.*** Phone Number: 682-***-**** LinkedIn

AI Research Engineer Machine Learning & Data Science Specialist

Building scalable AI systems with Generative AI, Predictive Modeling, and Cloud-native MLOps

Innovative AI/ML professional with a track record of delivering production-grade machine learning solutions across finance, healthcare, and enterprise platforms. Experienced in designing scalable ML pipelines, prototyping GenAI applications, and enabling end-to-end automation using modern MLOps workflows. Recognized for rapid adoption of emerging AI frameworks (LLMs, GANs, VAEs, LangChain) and integrating them into business-facing applications with measurable impact. Being an AI Research Engineer is more than a role to me—it’s a way to shape how organizations think, act, and grow. My journey into Data Science and AI was intentional, driven by a curiosity for intelligence and its potential to solve real problems. I thrive at the intersection of engineering and innovation, where ideas turn into scalable pipelines, predictive models, and intelligent systems that people can actually rely on. Each project is an opportunity to transform raw data into decisions that matter, whether it’s guiding financial strategies, improving healthcare outcomes, or empowering businesses to see patterns they couldn’t see before. I see myself as both a builder and a learner, constantly exploring new frameworks and refining workflows, always with the goal of creating systems that are not only technically sound but also meaningful in the impact they deliver. Achievements include:

Deployed ML pipelines at scale, reducing model deployment cycle times by up to 70% using MLflow, Kubernetes, and Terraform.

Pioneered GenAI prototypes with OpenAI APIs, Hugging Face, LangChain, enabling document summarization, conversational AI, and knowledge retrieval.

Optimized financial forecasting models with Prophet, TFT, and ARIMA, improving prediction accuracy by 15–20%.

Designed real-time APIs and dashboards (Fast API, Tableau, Power BI) that enhanced executive decision-making across multiple domains.

Profile Summary

Experienced in LLM & GenAI frameworks: GPT, BERT, Hugging Face Transformers, LangChain, LangGraph, CrewAI,

Vertex AI, AutoGen—applied to conversational AI, document intelligence, and recommendation systems.

Knowledgeable in generative modeling techniques: GANs, VAEs, and diffusion models for synthetic data, text, and image generation.

Proficient in Python, R, and SQL, applying advanced predictive modeling, classification, clustering, and segmentation techniques.

Skilled in AI/ML libraries: Scikit-learn, XGBoost, LightGBM, CatBoost, TensorFlow, PyTorch for scalable ML development.

Expertise in time-series forecasting (ARIMA, SARIMA, Prophet, TFT) for demand planning, fraud anomaly detection, and financial risk modeling.

Hands-on with MLOps practices: MLflow, Docker, Kubernetes, Apache Airflow, Kedro, FastAPI, Terraform for automation and deployment.

Proficient in data engineering workflows: ETL pipeline building, Snowflake, SQL, NoSQL, MongoDB, with batch and streaming data support.

Strong visualization & reporting skills with Power BI, Tableau, Matplotlib, Seaborn, and Plotly for analytical storytelling.

Advocate of explainable and ethical AI: applying SHAP, LIME, PDP, AUC-ROC, BLEU, ROUGE, RAGAS metrics, adversarial prompt testing, HITL systems.

Cloud-native deployment experience across Azure (OpenAI, AKS), AWS (S3, Lambda, SageMaker), and GCP (Vertex AI, BigQuery, Dataflow).

Collaborative and innovation-driven professional, adept at ideation, MVP prototyping, and integrating AI into user-facing products.

Core Competencies

CATEGORY

SKILLS & TOOLS

PROGRAMMING LANGUAGES

Python, R

AI/ML LIBRARIES

Scikit-learn, XGBoost, LightGBM, TensorFlow, PyTorch, CatBoost

MACHINE LEARNING TECHNIQUES

Predictive Modelling, Segmentation Modelling, Classification, Clustering

NLP/LLMS

BERT, GPT, OpenAI APIs, Hugging Face Transformers, LangChain, LangGraph, CrewAI, LangSmith, AutoGen, FastText, Faiss, Chroma, Milvus, Pinecone, Rag

PROMPT ENGINEERING

Chain-of-Thought, Few-shot Prompting, Prompt Guardrails, N-gram Prompting

TIME-SERIES FORECASTING

ARIMA, SARIMA, Prophet, TFT

TESTING/EXPLAINABILITY

SHAP, LIME, PDP, BLEU, ROUGE, AUC-ROC, Red Teaming, CI/CD (GitHub Actions, Azure DevOps), RAGAS Metrics

MLOPS & AUTOMATION

MLflow, Docker, Kubernetes, Apache Airflow, Kedro, FastAPI, MLAlgos, n8n, cursorAI

CLOUD

Azure (OpenAI, AKS), AWS (S3, Lambda, SageMaker), GCP (Vertex AI)

SECURITY & ETHICS

HIPAA, PII/PHI Redaction, Adversarial Prompt Testing, HITL Systems

VISUALIZATION/REPORTING

Power BI, Tableau, Matplotlib, Seaborn, Plotly

DATABASES

SQL, MongoDB, NoSQL, Snowflake,

VERSION CONTROL

Git, Bitbucket, TFS

PYTHON LIBRARIES

NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch

PROJECTS UNDERTAKEN

Walgreens Healthcare AI-First Clinical Knowledge Automation Platform Aug 2024-present

Role: Generative AI Engineer

Domain: Healthcare Cloud: AWS

Challenge:

Hospitals faced challenges with unstructured clinical data, inconsistent compliance documentation, and manual claim validation, leading to inefficiencies in patient care and higher administrative costs.

Innovation:

Leveraged GitHub Copilot as an AI development assistant to streamline ETL coding, fine-tune SQL queries, and integrate APIs, cutting down development time and reducing defect rates.

Conducted a POC with Cursor AI to test AI-powered debugging and collaborative code review features, which demonstrated faster delivery cycles in multi-developer healthcare environments.

Piloted the Model Context Protocol (MCP) for orchestrating multi-agent systems, enabling seamless coordination across clinical summarization, compliance validation, and claims processing modules.

Configured n8n automation pipelines for EHR synchronization, claims verification, and reporting alerts, minimizing repetitive manual intervention and improving workflow efficiency.

Designed RAG-based pipelines with AWS OpenSearch and LangChain to deliver context-aware summarization of medical records, generating evidence-backed insights for clinical decision support.

Prototyped domain-focused AI agents to handle documentation, insurance claims, and compliance checks, ensuring scalable and modular automation for healthcare operations.

Applied data science methods such as text summarization, descriptive analytics, and trend analysis to transform unstructured clinical notes into structured insights for actionable reporting.

Built real-time data engineering pipelines with AWS Glue, Kinesis, S3, and Redshift to integrate patient vitals, EHR datasets, and insurance claims into a unified analytics framework.

Embedded governance and compliance workflows by implementing PII/PHI redaction, audit-ready logs, and HIPAA-aligned safeguards to enhance data trust and regulatory readiness.

MCP: Strengthened context-aware decision-making across distributed AI models.

n8n: Automated multi-cloud data processes, reducing operational overhead.

CrewAI: Optimized retail intelligence workflows, boosting efficiency and improving end-user experience.

Impact:

Reduced administrative workload by 45% through GenAI-driven document summarization and claims automation.

Improved ETL speed and reliability by 30%, ensuring timely updates to clinical and claims data.

Enabled compliance-ready audit trails via RAG-powered summarization pipelines.

Accelerated development and debugging cycles with AI coding assistants (Copilot, Cursor AI).

Established a scalable AWS-native foundation for healthcare GenAI adoption.

Tech Stack: Python, SQL, OpenAI APIs, LangChain, LangGraph, Hugging Face Transformers, RAG Pipelines, AI Agents, GitHub Copilot, Cursor AI (POC), MCP (POC), n8n (POC), AWS (Glue, Kinesis, S3, Redshift, SageMaker), Tableau, Power BI

Wells Fargo Credit Risk & Regulatory Automation System July 2022-June 2023

Role: Machine Learning Engineer

Domain: Finance & Banking Cloud: Azure

Challenge:

Banks faced difficulty in accurately assessing creditworthiness and meeting regulatory compliance deadlines due to fragmented data pipelines and legacy systems, increasing both operational delays and financial risk exposure.

Innovation:

Built and fine-tuned credit default, fraud detection, and customer risk models using Scikit-learn, XGBoost, LightGBM, and PyTorch, improving predictive accuracy.

Applied Prophet, SARIMA, and TFT to forecast loan repayment trends, liquidity risks, and anomaly detection, supporting proactive portfolio management.

Established a continuous ML lifecycle using MLflow, Docker, Kubernetes (AKS), and Apache Airflow, ensuring scalable, automated retraining and reproducible deployments.

Automated Azure infrastructure provisioning (AKS clusters, storage, pipelines) with Terraform, cutting setup times and ensuring consistency.

Integrated models as FastAPI microservices for real-time credit scoring and fraud alerts, embedded directly into banking applications.

Implemented SHAP, LIME, PDP, and AUC-ROC to guarantee transparent, regulator-ready insights for auditors and compliance officers.

Orchestrated ETL pipelines using Azure Data Factory, SQL, and Snowflake, consolidating transaction, KYC, and regulatory datasets.

Built dashboards with Power BI and Tableau to deliver real-time risk insights for risk management and compliance teams.

Impact:

Improved loan risk prediction accuracy by 20% over legacy approaches.

Reduced regulatory reporting time by 55% through automated pipelines and explainable outputs.

Cut infrastructure provisioning effort by 40% with Terraform-driven IaC.

Established a resilient Azure-native MLOps ecosystem, ensuring continuous compliance with evolving regulations.

Tech Stack: Python, R, Scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, Prophet, SARIMA, TFT, MLflow, Docker, Kubernetes (AKS), Apache Airflow, Terraform, FastAPI, Azure (AKS, Data Factory), SQL, Snowflake, Power BI, Tableau, SHAP, LIME, PDP, AUC-ROC

United Health Care Smart Insurance Risk Analytics & Fraud Prevention Platform July 2021-June 2022

Role: Data Scientist

Domain: Insurance Cloud: GCP (Google Cloud Platform)

Challenge:

The insurer struggled with high claim fraud rates, slow actuarial forecasting, and fragmented customer-policy data, impacting profitability and delaying underwriting decisions.

Innovation:

Developed fraud detection, churn prediction, and risk scoring models using Scikit-learn, CatBoost, and LightGBM, improving underwriting efficiency.

Conducted customer segmentation and behavior analysis to enable targeted premium pricing and retention strategies.

Applied ARIMA, Prophet, and TFT to forecast claims frequency, premium revenues, and reserve requirements, supporting actuarial planning.

Built ETL pipelines with GCP Dataflow, BigQuery, and Cloud Storage, integrating diverse customer, claims, and policy datasets.

Designed interactive dashboards in Tableau, Power BI, and Plotly, giving actuaries and underwriters real-time insights into claims ratios and portfolio risks.

Consolidated insurance data into SQL, NoSQL, and Snowflake repositories, enabling scalable and efficient analytics.

Implemented PII redaction and data validation controls to safeguard sensitive policyholder data and meet regulatory standards.

Impact:

Reduced fraud-related losses by 18% with predictive analytics.

Improved actuarial forecast accuracy by 15%, strengthening financial planning.

Enhanced underwriting decision speed with real-time dashboards and unified datasets.

Delivered a GCP-native analytics platform modernizing claims and risk management.

Tech Stack: Python, R, Scikit-learn, CatBoost, LightGBM, SQL, NoSQL, Snowflake, Tableau, Power BI, Plotly, GCP (BigQuery, Dataflow, Cloud Storage), Prophet, ARIMA, TFT

Reliance Retail Jan 2021-June 2021

Role : Associate Data Scientist

Challenge:

Faced the complexity of cleaning large-scale raw behavioural datasets and engineering meaningful features to capture nuanced user-product interactions.

Balanced the need for real-time recommendation accuracy with scalability and system stability in a high-traffic e-commerce environment.

Innovation:

Engineered data preprocessing workflows using Python (pandas, NumPy) to automate dataset cleaning and restructuring.

Built feature engineering modules to extract product and user interaction attributes that improved model learning.

Implemented collaborative filtering (Surprise, scikit-learn) and content-based filtering (NLP libraries) for recommendation diversity.

Designed hybrid recommendation architectures that merged collaborative and content-based filtering for higher precision.

Automated training pipelines with TensorFlow and PyTorch to retrain models on historical engagement data.

Deployed models using Docker + Kubernetes, ensuring smooth integration into Flipkart’s large-scale infrastructure.

Optimized Snowflake data pipelines with Snowpark API for ML-driven analytics and faster query execution.

Applied parallel processing and distributed computing to enhance efficiency and scalability of recommendation workflows.

Impact:

Delivered personalized and precise recommendations that improved user engagement and conversion rates.

Strengthened system performance, scalability, and availability by optimizing pipelines and monitoring production systems.

Enabled data-driven decision-making through A/B testing, evaluation frameworks, and actionable insights for model improvement.

Tech Stack: Python (pandas, NumPy, scikit-learn, Surprise, TensorFlow, PyTorch), NLP libraries (spaCy, NLTK), SQL, Snowflake (Snowpark API),Power BI (DAX, custom visuals), Docker, Kubernetes, distributed computing, logging & monitoring solutions.

ACADEMIC CREDENTIALS

University of North Texas

Master’s in Computer Science

Madanapalle Institute of Technology and Science

Bachelors in Computer Science and Technology

CERTIFICATIONS

Microsoft Certified: Azure Data Engineer Associate(DP-203)

AI For Everyone – DeepLearning.AI (Coursera, 2020)

Programming for Everybody (Getting Started with Python) – University of Michigan (Coursera, 2020)

Python Data Structures – University of Michigan (Coursera, 2020)



Contact this candidate