AI/ML Engineer - LLMs, RAG, MLOps, VectorDBs

Location:

Farmers Branch, TX, 75244

Posted:

January 20, 2026

Contact this candidate

Resume:

KOLUGURI SAI KIRAN

Dallas, Texas +1-972-***-**** *****************@*****.***

Professional Summary:

● AI/ML Engineer with 5 years of hands-on experience in developing and deploying machine learning and Generative AI solutions across finance, healthcare, and enterprise SaaS domains..

● Proficient in LLM-based systems, including Retrieval-Augmented Generation (RAG) pipelines, fine-tuning (LoRA, SFT), and prompt engineering for real-world AI applications.

● Skilled in MLOps platforms such as MLflow, TensorFlow Extended (TFX), Kubeflow, SageMaker, Azure ML, and Databricks for end-to-end model lifecycle automation.

● Experienced in vector databases (FAISS, Pinecone, Milvus) and semantic search to enable efficient retrieval and knowledge-augmented AI systems.

● Hands-on expertise in Python, FastAPI, and Flask for building scalable model APIs and containerized deployments using Docker and Kubernetes (AKS/EKS).

● Strong background in cloud AI development (AWS, Azure, GCP), including model training, deployment, and monitoring using SageMaker, Vertex AI, and Azure ML.

● Familiar with CI/CD and Infrastructure as Code tools (GitLab CI/CD, Jenkins, Terraform) for continuous delivery of ML services.

● Practical experience in observability and monitoring using Prometheus, Grafana, and CloudWatch to ensure reliable, production-ready ML operations.

● Applied Responsible AI principles by implementing model explainability, bias detection, and safe prompt evaluation practices.

● Proven collaborator in Agile/DevOps environments, delivering robust AI features through teamwork, experimentation, and iteration.

Technical Skills:

Programming & Scripting

● Python, Java, JavaScript, TypeScript, SQL, Bash

Machine Learning & AI

● Scikit-learn, TensorFlow, PyTorch, HuggingFace Transformers, LangChain, vLLM

● LLMs (GPT, LLaMA, Falcon, BERT), LoRA Fine-Tuning, Prompt Engineering, Retrieval- Augmented Generation (RAG)

● Embeddings (Sentence Transformers), Vector Databases (Milvus, FAISS, Pinecone)

● NLP, Computer Vision (OpenCV, YOLO), Forecasting, Time Series Analysis MLOps & Model Lifecycle

● MLflow, TensorFlow Extended (TFX), Kubeflow, Vertex AI, SageMaker, Azure ML, Databricks.

● Experiment Tracking, Model Versioning, Deployment Automation, Model Drift & Bias Monitoring. Data Engineering & Processing

● Apache NiFi, Apache Kafka, Apache Spark, Airflow, GCP Composer

● SQL (MSSQL, PostgreSQL, MySQL, BigQuery, Snowflake), NoSQL (MongoDB, DynamoDB)

● AWS Textract, AWS Glue, Azure Data Factory, ETL/ELT Workflows. Cloud Platforms

● AWS: SageMaker, Lambda, CodePipeline, CodeBuild, CodeCommit, ECS/Fargate, CloudFormation, Textract, Bedrock, RDS, DynamoDB, S3, ECR, CloudWatch, Secrets Manager, SSM Parameter Store, SageMaker Feature Store

● Azure: AKS, ACR, Databricks, Data Factory, ADLS, Azure Functions, Azure Monitor

● GCP: Vertex AI, GKE, BigQuery, Composer, Pub/Sub, Cloud Functions, Cloud Storage. Containerization & Orchestration

● Docker, Kubernetes (EKS, AKS, GKE), Helm, Kustomize

● Service Mesh: Istio, AWS App Mesh

● Ingress Controllers: NGINX, ALB Ingress, Azure Application Gateway, GCP Load Balancer. CI/CD & Infrastructure as Code

● GitLab CI/CD, Jenkins, Azure DevOps, AWS CodePipeline, CodeBuild, CodeCommit, Terraform, CloudFormation, Argo CD, Ansible

● Terraform, Argo CD, Ansible

Monitoring, Logging & Observability

● Prometheus, Grafana, ELK Stack, CloudWatch, Datadog Web Development & APIs

● React, Node.js, FastAPI, Flask, REST, GraphQL, gRPC, APISIX

● Responsive UI/UX, Dashboards, Streamlit, Tableau, QlikView Dev Productivity & Collaboration

● Git, Jira, Confluence, Agile/Scrum

● AI Productivity Tools: Cursor, Windsurf, GitHub Copilot Professional Experiences:

SDE/ML Engineer Intern Aug 2025 – Present

Sainar Solution,Frisco,Texas

Tools and Technologies: Python, TensorFlow, PyTorch, Scikit-learn, HuggingFace, LangChain, RAG Pipelines, MLflow, Databricks, Azure ML, AWS SageMaker, Docker, Kubernetes (AKS/EKS), FastAPI, Terraform, GitLab CI/CD.

Roles and Responsibilities:

● Assisted in designing and developing ML pipelines for model training, testing, and deployment across multi-cloud environments (Azure, AWS).

● Supported Retrieval-Augmented Generation (RAG) pipeline development using LangChain and HuggingFace Transformers to enhance contextual response accuracy.

● Helped fine-tune and evaluate LLMs (GPT, LLaMA, Falcon) using LoRA and supervised fine- tuning techniques to improve inference performance.

● Built and tested FastAPI-based microservices for model inference and integration with frontend applications.

● Worked with MLflow and Databricks for experiment tracking, model registry, and lifecycle management.

● Deployed containerized ML applications using Docker and Kubernetes, focusing on scalability and reproducibility.

● Assisted in setting up Azure ML and SageMaker environments for automated training and model monitoring.

● Created evaluation metrics dashboards using Prometheus and Grafana to monitor model performance and latency.

● Participated in Agile sprints, supporting backlog refinement, documentation, and demo presentations.

● Collaborated with senior engineers to integrate AI-driven features such as semantic search, summarization, and text generation into existing systems.

● Operated in an Agile/Scrum environment, collaborating with product owners, data scientists, and DevOps teams to deliver ML features incrementally. SDE/Machine Learning Engineer May 2022 – Aug 2023

ServiceNow, Hyderabad, India

Tools and Technologies: Python, PyTorch, TensorFlow, HuggingFace Transformers, Scikit- learn, NumPy, Pandas, FastAPI, Flask,GPT, LLaMA, Falcon, BERT, LoRA, RAG Pipelines, Prompt Engineering, LangChain, vLLM,MLflow, Kubeflow Pipelines, TensorFlow Extended

(TFX), Databricks, Vertex AI, SageMaker, Azure ML,Apache Spark, SQL (MSSQL, PostgreSQL), Elasticsearch, AWS Textract,Docker, Kubernetes (EKS, AKS, GKE),GitLab CI/CD, Jenkins, Azure DevOps, Terraform,Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana),Cursor, Windsurf, GitHub Copilot. Roles and Responsibilities:

● Designed and developed end-to-end ML workflows for enterprise AI solutions, leveraging LLMs, Generative AI, and Retrieval-Augmented Generation (RAG) pipelines to improve model performance and relevance.

● Built prototypes and proof-of-concepts (POCs) for integrating AI into ServiceNow applications, including intelligent ticket classification, predictive analytics, and chatbot enhancements.

● Conducted large-scale experiments on LLMs to assess accuracy, bias, robustness, and fairness, applying responsible AI practices to ensure compliance and transparency.

● Fine-tuned pre-trained LLMs (GPT, LLaMA, Falcon, BERT) using LoRA, distillation, and supervised fine-tuning, reducing latency while improving accuracy in production.

● Deployed MLflow Tracking and Model Registry on Azure and GCP to manage experiments, hyperparameter tuning, and versioned model packaging.

● Implemented MLflow REST APIs to integrate experiment metadata into internal dashboards and model performance visualization tools.

● Implemented prompt engineering techniques and retrieval augmentation for enterprise search and conversational AI applications, improving precision in query responses.

● Developed scalable training and deployment pipelines using MLflow, TensorFlow Extended, HuggingFace Transformers, and Kubeflow Pipelines, ensuring reproducibility across environments.

● Orchestrated containerized ML workloads on Kubernetes (EKS, AKS, GKE), integrating with Istio/App Mesh for service mesh security, observability, and traffic routing.

● Automated ML infrastructure provisioning using Terraform, ArgoCD, and GitLab CI/CD, enabling continuous integration and delivery of ML models and services.

● Deployed models for real-time inference using SageMaker endpoints, Vertex AI, Azure ML, ONNX, and Triton Inference Server, ensuring low-latency, production-grade serving.

● Curated, cleaned, and augmented datasets (synthetic and real-world) to improve training pipelines, including data labeling, feature engineering, and anomaly detection.

● Designed monitoring dashboards with Prometheus, Grafana, CloudWatch, and Kibana to track model drift, accuracy decay, inference latency, and SLA compliance.

● Integrated ML services with the ServiceNow platform, enabling AI-driven automation across ITSM, incident response, and knowledge management workflows.

● Leveraged AWS Textract and NLP models to extract structured data from unstructured documents and integrated results into ML pipelines.

● Built and managed FastAPI-based inference microservices, deployed with Docker/Kubernetes for scalable, cloud-native ML delivery.

● Applied responsible AI principles by implementing bias testing, model interpretability (SHAP, LIME), and fairness checks in evaluation workflows.

● Worked in a Scaled Agile (SAFe) framework to coordinate ML releases across multiple teams, ensuring alignment with quarterly OKRs.

● Built reproducible CI/CD pipelines for AI workflows using CodePipeline and GitLab CI/CD, automating deployment of ML microservices to AWS ECS and SageMaker endpoints.

● Maintained an Agile Kanban dashboard for tracking experiment lifecycle, data preparation tasks, and CI/CD automation deliverables.

● Explored AI productivity tools (Cursor, Windsurf, Copilot) to accelerate ML experimentation and improve developer productivity.

● Collaborated with research, product, and engineering teams to align AI/ML features with enterprise business goals, presenting results to technical and non-technical stakeholders.

● Mentored junior engineers in MLOps best practices, including version control, experiment tracking, and pipeline automation.

● Collaborated with DevOps teams to integrate CloudWatch metrics, Datadog agents, and Prometheus exporters for end-to-end model observability.

● Implemented A/B testing and canary deployments for model rollouts, minimizing risk and ensuring reliable production adoption.

● Contributed to ServiceNow AI/ML platform integration, enabling AI-powered features such as intelligent ticket classification, chatbot enhancements, and predictive analytics.

● Contributed to multi-cloud ML architecture, deploying hybrid ML solutions across AWS, Azure, and GCP with a focus on scalability and cost optimization. Full Stack Engineer/ML/Ops Mar 2019- April 2022

Cognizant, Chennai, India

Clients: Fiserve/First Data,Discover

Tools and Technologies: Python, PyTorch, TensorFlow, HuggingFace, Scikit-learn, SQL, Spark, Pandas, Kubeflow, Vertex AI, Azure ML,RAG SageMaker, Databricks, Apache NiFi, Kafka, Elasticsearch, Docker, Kubernetes (EKS/AKS/GKE), Terraform, GitLab CI/CD, Azure DevOps, Prometheus, Grafana

Roles and Responsibilities:

● Designed and implemented AI/ML solutions covering data ingestion, feature engineering, model training, deployment, monitoring, and business integration across financial services, healthcare, and manufacturing domains.

● Built and optimized ML models (classification, regression, anomaly detection, NLP, GenAI prototypes) with hyperparameter tuning, cross-validation, and ensembling to improve predictive accuracy.

● Supported early adoption of Generative AI and LLM techniques (RAG pipelines, prompt engineering, model fine-tuning) for document processing and knowledge automation.

● Created scalable data pipelines using Apache NiFi, Kafka, and Spark to ingest millions of transactions and IoT telemetry data, ensuring seamless flow into ML systems.

● Orchestrated ML workflows on Vertex AI, SageMaker, and Azure ML, deploying models for both batch and real-time inference through APIs and serverless endpoints.

● Worked with MLOps engineers to productionize ML models using Kubeflow Pipelines, MLflow, and containerized microservices (Docker + Kubernetes).

● Integrated ML services into customer-facing applications and internal systems, enabling fraud detection, demand forecasting, and claims processing in production.

● Designed fraud detection pipelines (Fiserv) combining NiFi, MSSQL, and Elasticsearch with ML models deployed as APIs, cutting fraud response time by 30%.

● Designed ML pipelines for claims processing and member services, ensuring HIPAA- compliant handling of sensitive healthcare data.

● Implemented claims validation models exposed through APIs, integrated with existing claims systems for real-time scoring.

● Deployed Elasticsearch-backed search tools for rapid claims lookup and audit reporting.

● Automated model deployments using GitLab CI/CD, Azure DevOps, Jenkins, and Terraform, enabling consistent, version-controlled infrastructure across multi-cloud environments.

● Automated AWS resource provisioning using CloudFormation and Terraform, ensuring IaC best practices across multiple AWS accounts.

● Implemented Lambda-based model monitoring alerts for model drift and SLA breaches, integrating with CloudWatch and Slack notifications.

● Designed and executed live model tests (A/B testing, canary deployments) to validate performance in production environments.

● Evaluated models for bias, fairness, and robustness, ensuring compliance with organizational standards.

● Practiced Agile iteration planning and daily stand-ups, managing parallel ML experiments and model retraining cycles.

● Collaborated with product teams using JIRA and Confluence to maintain transparency in sprint deliverables and cross-functional dependencies.

● Partnered with data scientists, engineers, and business stakeholders to align AI outcomes with customer requirements.

● Guided junior engineers on ML pipeline design, MLOps best practices, and cloud-native deployments, improving team maturity and delivery speed.

● Delivered AI-driven solutions that reduced fraud losses, improved operational efficiency, and accelerated healthcare claims processing, directly contributing to client ROI. IOT Engineer Intern June 2017 to Dec 2018

Cloud Chip Technology Pvt Ltd., Hyderabad, India

Tools and Technologies: Python, OpenCV, TensorFlow, MSSQL, Apache NiFi, Docker, Kubernetes, IoT Hub, Azure ML Studio, React, Elasticsearch

Roles and Responsibilities:

● Designed and implemented IoT data pipelines using NiFi and Kepware to collect and process telemetry from smart devices and PLC controllers.

● Built real-time dashboards in React + Elasticsearch to visualize IoT device metrics such as uptime, performance, and anomaly alerts.

● Developed computer vision prototypes using OpenCV and TensorFlow for defect detection in conveyor belt products.

● Containerized prototypes with Docker and orchestrated with Kubernetes, gaining early exposure to cloud-native ML operations.

● Assisted in building predictive maintenance models for industrial IoT, applying statistical anomaly detection on telemetry streams.

● Contributed to ETL workflows to prepare IoT datasets for training early-stage ML models in Azure ML Studio.

● Documented IoT + ML workflows and collaborated with senior engineers to integrate early prototypes into production pipelines.

Education

● Bachelor’s in electrical and electronics engineering from JNTU India (2014 to 2018).

● Master’s in computer science from Southern Arkansas University (2023 Aug – 2025 May).

Contact this candidate