Data Scientist, Ai Engineer

Location:

Buffalo, NY

Posted:

October 20, 2025

Contact this candidate

Resume:

BINDU

Gen AI Engineer

*****.***@*****.*** linkedin +1-716-***-****

PROFESSIONAL SUMMARY:

● Around 7 years of experience as GenAI Engineer & Data Scientist specializing in LLMs, RAG, and ML solutions

across Finance, Healthcare. Proven expertise in building scalable AI systems, optimizing ML workflows, and

deploying enterprise-grade GenAI applications in multi-cloud environments.

● Extensive expertise in designing, fine-tuning, and deploying Large Language Models (LLMs) including GPT,

LLaMA, Falcon, and Claude with Lang Chain, RAG pipelines, and vector search integrations.

● Led end-to-end application migrations from AWS and Azure to GCP, modernizing enterprise workloads using

GKE, Cloud SQL, Big Query, and Cloud Functions.

● Hands-on experience in application development with Python (Fast API), Node.js, and Java (Spring Boot),

deploying scalable microservices integrated with ML/AI pipelines.

● Designed RAG-powered AI assistants using GPT-4, Pinecone, and Lang Chain to enable intelligent knowledge

retrieval for financial and healthcare clients.

● Fine-tuned domain-specific LLMs with LoRA/PEFT techniques, improving accuracy and contextual relevance

for enterprise use cases.

● Built predictive ML models for fraud detection, customer churn, demand forecasting, and clinical outcomes,

leveraging Random Forest, XGBoost, and Neural Networks.

● Strong background in NLP – developed pipelines for document summarization, named entity recognition

(NER), sentiment analysis, and medical text de-identification.

● Designed and deployed time-series forecasting models for patient monitoring, telecom call volumes, and

financial risk management.

● Automated end-to-end ML and GenAI workflows with Airflow, Kubeflow, MLflow, Jenkins, and GitHub

Actions, ensuring scalable and reproducible model deployments.

● Developed ETL and real-time data pipelines using Kafka, Databricks, Spark, and Snowflake for large-scale data

processing and ML training.

● Implemented cloud-native AI solutions using AWS SageMaker, Bedrock, GCP Vertex AI, and Azure ML across

multi-cloud environments.

● Applied cloud networking, IAM, and data security best practices to ensure compliance with HIPAA, SOC2, and

financial regulatory requirements.

● Designed vector database search solutions with Pinecone, Weaviate, FAISS, and Milvus for

retrieval-augmented generation and semantic search.

● Developed graph-based ML solutions for drug interaction detection and fraud pattern discovery in financial

and telecom datasets.

● Created real-time dashboards and BI reports in Tableau, Power BI, and Looker to monitor AI-driven insights

and performance metrics.

● Implemented monitoring and observability solutions with Stack driver, CloudWatch, Grafana, Prometheus,

and ELK for AI/ML workloads.

● Mentored junior data scientists and engineers, leading AI/ML workshops and GenAI adoption sessions across

client organizations.

● Partnered with cross-functional teams to translate business requirements into scalable AI/ML solutions,

driving digital transformation initiatives across industries.

TECHNICAL SKILLS:

● Generative AI & LLMs: GPT, LLaMA, Falcon, Claude, LangChain, LangGraph, LangFlow, RAG, LoRA/PEFT,

Prompt Engineering

● ML & DL Frameworks: PyTorch, TensorFlow, Scikit-learn, Hugging Face, spaCy, BioBERT

● Programming & Data: Python, R, SQL, Spark, Django, Pandas, NumPy, Flask, Fast API

● Cloud & MLOps: AWS SageMaker, GCP Vertex AI, Azure ML, MLflow, Kubeflow, Docker, Kubernetes

● Vector DBs & Search: Pinecone, Weaviate, FAISS, Milvus, ElasticSearch

● Data Engineering: Airflow, Kafka, Databricks, ETL, Data Lakes, Big Query, Snowflake

● Visualization & BI: Tableau, Power BI, Looker, Plotly, Matplotlib, Seaborn

● Monitoring & Logging: Stackdriver, CloudWatch, ELK Stack, Grafana, Prometheus

● Methodologies: Agile (Scrum, Kanban), SDLC, CI/CD

PROFESSIONAL EXPERIENCE:

Intralot, Chicago, IL APRIL 2024- Present

Role: Sr. GenAI/ML Engineer

Responsibilities:

● Designed and deployed generative AI solutions such as Retrieval-Augmented Generation (RAG) pipelines by

integrating GPT-4, LangChain, and Pinecone, enabling intelligent knowledge retrieval from structured and

unstructured enterprise data sources, significantly improving search accuracy and response quality for

financial domain applications.

● Built an AI-powered customer support assistant leveraging LLMs that streamlined call centre operations by

automating responses to repetitive queries, reducing manual intervention, and enabling real-time escalation

for complex cases across multi-channel support platforms using agentic AI framework LangGraph with multi

agent orchestration.

● Developed Python-based RESTful APIs to integrate LLM services into enterprise applications, ensuring

seamless interaction between AI models and internal systems such as CRM, knowledge management, and

reporting platforms.

● Implemented robust data pipelines using Apache Airflow and Apache Spark to automate the ingestion,

transformation, and preparation of large-scale financial datasets for model training, ensuring high-quality,

reliable, and consistent data availability for GenAI and ML workloads.

● Conducted fine-tuning of LLaMA models with domain-specific financial datasets, improving contextual

accuracy for tasks such as risk analysis, compliance checks, and document summarization tailored to the

banking and financial industry.

● Designed analytical workflows in Google Big Query to process, query, and visualize high-volume financial

transaction data, providing deep insights into customer behaviour, fraud detection patterns, and overall

portfolio performance.

● Applied advanced transformer-based NLP models such as BERT and Roberta for sentiment analysis and named

entity recognition, extracting actionable intelligence from financial documents, customer communications,

and market news feeds.

● Built interactive dashboards in Tableau that consolidated ML-driven risk scores, fraud alerts, and performance

metrics, enabling executives and analysts to track financial risk analytics in real-time with clear visualizations

and drill-down capabilities.

● Developed automated CI/CD pipelines for ML models using MLflow, Docker, and GitHub Actions, ensuring

version control, reproducibility, and seamless deployment of machine learning models into production

environments on cloud platforms.

● Applied cloud security best practices by implementing IAM policies, encryption standards, and audit logging

for financial datasets hosted on multi-cloud platforms, ensuring compliance with strict regulatory frameworks

such as SOC2 and PCI DSS.

● Collaborated with DevOps teams to deploy large-scale ML models on Google Cloud Vertex AI, leveraging

containerized workloads on GKE clusters for high-performance, scalable inference serving to meet

enterprise-level demand.

● Configured and managed hybrid cloud networking between AWS and GCP, including VPNs, VPC peering, and

load balancing, to ensure secure and reliable connectivity across multi-cloud environments supporting AI/ML

applications.

● Created custom embeddings for financial documents using Hugging Face Transformers, enabling semantic

search, similarity matching, and document classification to enhance knowledge discovery within large

repositories of financial records.

● Automated monitoring and observability workflows by integrating Google Stack driver and Prometheus,

setting up real-time alerts and dashboards for model performance, infrastructure health, and latency tracking

across distributed systems.

● Supported business stakeholders with ad-hoc ML model performance reporting, preparing clear insights and

tailored visualizations to help decision-makers evaluate the effectiveness of deployed AI solutions and align

outcomes with organizational goals and delivered in Agile sprints with close coordination across data,

DevOps, and product teams.

Environment: Python, PyTorch, Hugging Face, Lang Chain, Pinecone, GCP Vertex AI, Big Query, Airflow, Spark,

Tableau, Docker, Kubernetes, MLflow, Prometheus, Stack driver.

Elevance Health Norfolk Virginia APRIL 2023 - Mar 2024

Role: AI/ML Engineer

Responsibilities:

● Developed advanced NLP pipelines for medical record classification by leveraging transformer models such as

BERT and Bio BERT, enabling automated extraction and categorization of clinical terms, diagnoses, and

procedures to support faster and more accurate medical decision-making.

● Built Generative AI models for clinical note summarization, transforming lengthy physician notes and patient

histories into concise, structured summaries, improving physician efficiency and enhancing interoperability

across healthcare systems.

● Integrated FHIR (Fast Healthcare Interoperability Resources) standards within AI/ML pipelines, ensuring that

data exchange between systems adhered to industry healthcare interoperability protocols and supported

seamless integration with EHR platforms.

● Deployed ML models on AWS Sage Maker and generative AI models on AWS Bedrock for high-performance

production workloads, enabling scalable inference, automated retraining pipelines, and monitoring of

healthcare models at enterprise scale.

● Implemented patient data de-identification pipelines using NLP and custom scripts to mask PHI (Protected

Health Information), ensuring compliance with HIPAA regulations and protecting sensitive patient data during

AI model training and deployment.

● Designed time-series forecasting models for patient monitoring, predicting vital signs such as heart rate, blood

pressure, and respiratory trends, enabling proactive clinical interventions and supporting personalized

healthcare.

● Applied LoRA-based fine-tuning techniques on healthcare-specific LLMs, adapting general-purpose models like

LLaMA and Falcon to clinical text datasets, improving the contextual accuracy of patient interaction and

documentation tasks.

● Built graph-based ML solutions for drug interaction detection, modelling patient medication networks and

identifying potential adverse drug combinations, assisting healthcare providers in clinical safety decisions.

● Orchestrated multi-agent clinical safety workflows using LangGraph, where LLM agents leveraged graph-based

ML models to detect potential drug interactions and recommend safe alternatives in real time.

● Automated CI/CD workflows using Jenkins and GitHub Actions, ensuring that ML models and pipelines were

continuously tested, validated, and deployed across development, staging, and production environments.

● Created high-throughput ETL pipelines with Apache Kafka to ingest and stream real-time medical data from

multiple hospital systems, ensuring timely data availability for downstream AI/ML models.

● Deployed Kubernetes clusters to host ML and GenAI microservices, enabling containerized deployment,

scalability, and fault tolerance for healthcare applications across multi-cloud environments, participated in

sprint planning, backlog refinement, and daily standups as part of an Agile delivery model.

● Designed compliance-driven AI frameworks that embedded HIPAA and SOC2 requirements into ML workflows,

ensuring that all AI deployments adhered to stringent security, auditability, and regulatory standards.

● Developed interactive dashboards in Power BI to visualize clinical outcomes, patient risk scores, and predictive

model insights, supporting physicians and hospital administrators in data-driven decision-making.

● Conducted A/B testing of AI-powered patient engagement tools, evaluating the effectiveness of chatbots and

virtual assistants in improving patient communication, appointment scheduling, and adherence to treatment

protocols.

Environment: Python, TensorFlow, Hugging Face, BioBERT, AWS SageMaker, Kafka, Jenkins, GitHub Actions, Power

BI, Docker, Kubernetes, HIPAA Compliance Tools.

Helyxon Healthcare Solutions, INDIA DEC 2019 – DEC 2022

Data Scientist

Responsibilities:

● Developed predictive analytics models (Random Forest, XGBoost, Neural Networks) for patient risk

stratification, readmission prediction, and disease progression forecasting.

● Designed NLP pipelines to process clinical notes, EHR data, and physician documentation, enabling automated

named entity recognition (NER) for diagnoses, medications, and procedures.

● Built time-series forecasting models to monitor patient vitals (heart rate, blood pressure, oxygen levels),

supporting proactive clinical interventions.

● Implemented data de-identification and PHI masking pipelines using NLP and rule-based methods to ensure

HIPAA compliance.

● Applied Generative AI for clinical note summarization and automated medical documentation, improving

physician productivity.

● Developed drug interaction detection models using graph-based ML techniques to identify adverse

medication combinations.

● Created ETL and streaming pipelines with Apache Kafka, Spark, and Databricks to process multi-source

healthcare data in real time.

● Integrated FHIR standards for healthcare data interoperability, ensuring seamless data exchange between

hospital systems and AI pipelines.

● Deployed ML/GenAI models on AWS Sage Maker enabling scalable inference and retraining pipelines and

iteratively refined models in 2-week sprints following Agile Scrum practices..

● Built interactive dashboards Power BI to visualize patient risk scores, treatment outcomes, and hospital

performance metrics.

● Conducted A/B testing of AI-powered patient engagement tools (chatbots, symptom checkers), measuring

improvements in communication and appointment adherence.

● Collaborated with clinicians, healthcare providers, and IT teams to translate medical needs into AI/ML

solutions that support evidence-based decision-making.

Environment: Python, TensorFlow, PyTorch, Hugging Face Transformers (BERT, BioBERT, GPT models), Scikit-learn,

Pandas, NumPy, Apache Spark, Databricks, Kafka, AWS Sage Maker, SQL, Power BI, Tableau, FHIR, HL7, Docker,

Kubernetes, Git, HIPAA Compliance Tools.

Infosys, INDIA MAY 2018 – NOV 2019

Data Scientist

Responsibilities:

● Working with broad range of AWS Cloud Services like EC2, ELB, Glacier, Cloud Front, Code Deploy, Elastic

Beanstalk, Auto Scaling, Route53, AMI, SNS, SQS, DynamoDB, Elastic search and CloudWatch and have in

depth practical knowledge on other cloud services.

● Developed and optimized large-scale language models like GANs and VAEs for various applications, including

image generation and text synthesis.

● Worked on tasks involving language Modeling, text generation, and contextual comprehension.

● Integrated Azure Cognitive Services to extend functionality and improve AI solutions.

● Handled data pre-processing, augmentation, and generation of synthetic data to improve model accuracy.

● Built and deployed AI applications on Azure, utilizing services like Azure Databricks, Azure ML Studio, and

Azure Data Lake.

● Collaborated with teams to integrate AI models into existing applications, enhancing features and user

experience.

● Ensured the robustness, efficiency, and scalability of AI systems through continuous monitoring and

optimization.

Environment: Python, SQL, PySpark, TensorFlow, PyTorch, Scikit-learn, Hugging Face Transformers, Pandas,

NumPy, Matplotlib, Seaborn, Databricks, Apache Spark, Apache Kafka, Airflow, Snowflake, AWS Sage Maker,

Azure Databricks, Azure ML Studio, Azure Data Lake, Agile Methodology.

Contact this candidate