VARUN REDDY
************@*****.***
PERSONAL PROFILE STATEMENT
AI/ML, MLOps, and Cloud Engineer with 10+ years of experience building enterprise-grade AI, data, and cloud platforms across Azure, AWS, and GCP. Specializes in Generative AI, LLMOps, RAG architectures, MLOps automation, distributed ML training, and secure cloud-native deployments. Strong background in DevOps, Kubernetes, data engineering, feature engineering, and model productionization. Experienced in designing scalable AI systems, end-to-end ML pipelines, and cross-cloud GenAI solutions for large enterprises.
PROFESSIONAL EXPERIENCE
AI/ML & MLOps Engineer with 10+ years of experience building and operating enterprise-grade ML platforms across Azure, AWS, and GCP.
Expert in end-to-end ML lifecycle orchestration, including data validation, model packaging, evaluation workflows, environment promotion, and automated deployment gates.
Skilled in designing CI/CD for ML pipelines using Azure DevOps, GitHub Actions, Terraform, and GitOps to streamline and automate ML operations.
Strong expertise in containerizing ML workloads and deploying inference services to AKS/EKS/GKE with autoscaling, GPU nodes, and resilient service mesh integration.
Implemented model deployment strategies including blue-green, canary, shadow testing, and controlled rollout with automated rollback policies.
Designed secure ML infrastructure architectures using private endpoints, managed identities, IAM, Key Vault, VNET isolation, and zero-trust access models.
Built and managed feature engineering and feature store pipelines supporting consistent online/offline feature parity for production ML systems.
Developed scalable data ingestion and transformation pipelines using ADF, Databricks (PySpark), Snowflake, Dataflow, BigQuery, and Delta Lake.
Implemented ML observability frameworks with Prometheus, Grafana, Azure Monitor, MLflow, and OpenTelemetry for real-time model and pipeline monitoring.
Automated drift detection (data drift, model drift) with integrated alerting, diagnostics, and retriggering workflows for continuous ML reliability.
Established enterprise ML governance frameworks including model lineage, audit trails, dataset versioning, compliance controls, and approval gates.
Built scalable batch and real-time inference architectures using Kubernetes, serverless compute, load balancers, and autoscaling policies.
Managed multi-cloud ML environments, ensuring operational standardization, cost optimization, and cross-cloud portability for ML workloads.
Collaborated with data engineering and platform teams to implement high-availability, fault-tolerant ML systems with security, performance, and resiliency best practices.
Delivered production-grade ML ecosystems by integrating monitoring, automation, deployment workflows, governance controls, and operational playbooks for enterprise teams.
TECHNICAL SKILLS
Programming Languages: Shell scripting, powershell, Yaml, python, REST API,
Artificial Intelligence: Agentic AI, LLMs (Large Language Models), Open AI, Langchain, BERT, GenAI, NLP (Natural Language Processing), Prompt Engineering, RAG (Retrieval Augmented Generation), IVR, Avaya, Azure Cognitive Services, Gemini, GitHub CoPilot, Amazon Bedrock, Amazon Polly, Azure AI Studio, AI Agents, and Azure Copilot.
Machine Learning/Deep Learning: PyTorch, TensorFlow, Azure ML Studio, Amazon Sagemaker, Keras, Pandas, Numpy, Scikit-learn, Matplotlib, SciPy, Seaborn, NLTK, spaCy
Public Cloud: GCP, Azure, OCI, snowflake, AWS
Monitoring Tools: ITCAM, SOLAR WINDS, Azure Monitoring, Power BI, Splunk, Prometheus, FinOps, Sentinel, Dynatrace, App Dynamics
Databases: SQL Server, Microsoft SQL, Postgresql, Mango DB, Mysql, Cosmos DB, Cassandra
Cloud : Azure, AKS, CI Build Sever, Azure Repo, Azure Storage, Azure Devops, Azure security, ARM templates, Azure AD, Azure PAAS services (Azure functions, logic Apps, Event Grid), Azure Synapse, Azure Data Bricks, Azure Active directory, Azure data lake, Azure Fabric
Devops Tools: Jenkins, Ansible, Docker, Terraform, Azure devops
Conatainers: Docker, AKS(Azure Kubernetes Service), Kubernetes, EKS, GKS
SDLC: Agile, Scrum, SRE, kaban
Version Control : Git, TFS, SVN, Github, Git Actions, GitLab
EDUCATION
Masters – St Marys University San Antonio in Computer Science– completed 2015 in San Antonio TX
Bachelors – Osmania University Hyderabad in Electrical Engineering– completed 2012 in Hyderabad, India
Certifications
Designing and Implementing Microsoft DevOps Solutions (AZ-400)
Snowflake certification (SNOWPRO)
Certified Kubernetes Application Developer (CKAD)
Associate Cloud Engineer Certification (GCP)
AWS Certified AI Practitioner certificate
Certified Kuberenetes Adminstartor (CKA)
AI fundamentals 900 (Azure)
prompt engineering in chat GPT
WORK EXPERIENCE
Client: PegaSystem INC
Duration: September 2022 - present
Role: Platform MLOPS Engineer / Consultant
Responsibilities:
Designed and implemented cloud-native MLOps platforms across Azure, AWS, and GCP to centralize model lifecycle management, automate deployments, and standardize production AI operations across business units.
Built end-to-end CI/CD/CT pipelines using Azure DevOps and GitHub Actions to automate data validation, model packaging, integration testing, and controlled deployment to staging and production environments.
Designed, developed, and deployed advanced Generative AI models using OpenAI, Hugging Face Transformers, LangChain, LlamaIndex, TensorFlow, and PyTorch, enabling intelligent automation, conversation systems, and enterprise-grade NLP capabilities.
Built fully automated, end-to-end GenAI pipelines that covered data extraction, preprocessing, fine-tuning, hyperparameter optimization, evaluation, and scalable model serving for production environments.
Engineered robust Retrieval-Augmented Generation (RAG) systems by integrating vector databases such as Pinecone, ChromaDB, Milvus, Weaviate, and FAISS, ensuring high-accuracy grounding, reduced hallucinations, and contextually relevant responses.
Created reusable and modular prompt engineering frameworks, including dynamic prompt templates, multi-step prompt chains, autonomous agents, and workflow automations to standardize LLM interactions across multiple business units.
Implemented full LLMOps lifecycle practices such as model versioning, registry management, performance benchmarking, quality monitoring, drift detection, and governance enforcement for safe and reliable AI model deployments.
Integrated Generative AI models into enterprise applications by building scalable APIs, microservices, containerized deployments, and event-driven workflows using cloud-native technologies and modern ML stacks.
Applied extensive hands-on experience with LLMs — including OpenAI GPT series, Anthropic Claude, Meta Llama, Mistral, and custom fine-tuned transformer models — to evaluate capability differences, optimize usage patterns, and select the best model for each use case.
Developed embeddings and semantic search pipelines using Hugging Face models, sentence transformers, and vector databases to support intelligent document retrieval, classification, ranking, and context injection for LLM tasks.
Utilized Python, advanced AI libraries (Transformers, LangChain, PyTorch Lightning, OpenAI/Anthropic SDKs), and cloud compute resources to prototype, test, train, deploy, and monitor high-performing AI solutions rapidly and efficiently.
Built scalable AI workloads using AWS Bedrock, Amazon SageMaker, Azure OpenAI, and Google Vertex AI, aligning enterprise architecture with model availability, inference requirements, and cost optimization strategies.
Partnered with data engineering teams to enhance data quality, build feature pipelines, convert unstructured documents into structured embeddings, and curate high-value training datasets for RAG and fine-tuning workflows.
Designed detailed evaluation frameworks for generative models, including metrics for response accuracy, groundedness, consistency, safety, hallucination mitigation, and domain-specific performance measurement.
Optimized AI inference performance by applying prompt strategies, context window tuning, batching, request routing, caching layers, and response post-processing techniques to improve speed and cost efficiency.
Worked directly with cross-functional engineering, data science, product, and business teams to identify value-driven AI use cases, validate solution feasibility, and design production-grade systems that align with organizational goals.
Ensured all deployed Generative AI and RAG architectures followed industry best practices, responsible AI guidelines, data security protocols, cloud governance frameworks, and enterprise compliance requirements.
Environment:
LLMs, Langchain, Amazon Bedrock, AWS Polly, OpenAI, AWS (Athena GlueStep function), GitHub, AWS Lambda, PySpark, BERT, GenAI, NLP (Natural Language Processing), Prompt Engineering, RAG, IVR, Azure Cognitive Services, Amazon Bedrock.
Client: DXC Boston
Duration: December 2019 – September 2022
Role: Cloud Consultant Engineer
Responsibilities:
Automated provisioning of cloud environments using Terraform modules, enabling consistent, repeatable deployment of VMs, VNets/VPCs, subnets, load balancers, storage, and identity components across Azure, AWS, and GCP.
Built fully automated infrastructure blueprints for Databricks, Snowflake, ADF, AKS clusters, and Azure ML workspaces to accelerate environment creation and reduce manual configuration drift.
Implemented event-driven automation using Azure Functions, AWS Lambda, and GCP Cloud Functions to trigger data workflows, alerts, resource cleanup, compliance scans, and policy enforcement.
Configured cloud-native monitoring, logging, and alerting automation (Azure Monitor, CloudWatch, GCP Operations Suite) with auto-remediation patterns to reduce operational incidents.
Designed and implemented CI/CD pipelines using Azure DevOps and GitHub Actions for deploying infrastructure, Databricks notebooks, ETL code, APIs, and Kubernetes workloads.
Automated artifact versioning, dependency management, and deployment packaging using Azure Artifacts, GitHub Packages, and container registries (ACR/ECR/GCR).
Established automated security scanning, linting, and IaC policy enforcement using Checkov, OPA, and Azure Policy to ensure compliance before deployment.
Coordinated cross-team DevOps workflows with sprint planning, environment readiness, release approvals, and deployment gates for cloud applications and data platforms.
Implemented GitOps pipelines for infrastructure and Kubernetes deployments using ArgoCD and FluxCD, ensuring declarative configuration, automatic reconciliation, and rapid rollback capability.
Built version-controlled, environment-specific GitOps repositories to manage manifests, Helm charts, IaC templates, and configuration parameters for dev/stage/prod environments.
Automated cluster configuration management using Helm, enabling consistent rollout of monitoring agents, secrets stores, ingress controllers, and service mesh components across clusters.
Designed, deployed, and managed secure AKS clusters with RBAC, pod security policies, network policies, node pools, GPU integrations, and private clusters.
Implemented end-to-end Kubernetes deployment automation using Helm, Kustomize, GitOps, and CI/CD pipelines to deploy microservices and shared platform components.
Integrated Kubernetes services with Azure Key Vault, IAM roles, ConfigMaps, and Secrets to support secure environment variable management and application configuration.
Integrated Kubernetes-based model serving pipelines to deploy containerized ML inference services with autoscaling, health checks, and SLA monitoring.
Automated model promotion across dev stage prod environments using CI/CD policies, version-controlled registries, and environment validation checks.
Built monitoring dashboards for ML inference services using Prometheus/Grafana and Azure Monitor to track latency, error rates, throughput, and resource usage.
Environment:
Azure DevOps, GitHub Actions, Terraform, Azure Functions, AWS Lambda, GCP Cloud Functions, ArgoCD, FluxCD, Helm, Kubernetes (AKS/EKS/GKE), Azure Monitor, AWS CloudWatch, GCP Operations Suite, Azure Key Vault, AWS Secrets Manager, GCP Secret Manager), Prometheus, Grafana, Databricks, Snowflake, Azure Data Factory, BigQuery, PySpark, MLflow.
Client: CVS RI (Rhode Island)
Duration: January 2017 – November 2019
Role: Azure Consultant
Responsibilities:
Automated provisioning of Azure IaaS resources (VMs, VNets, Load Balancers, Storage, NSGs) using PowerShell, ARM templates, and Azure CLI to standardize cloud infrastructure deployment.
Designed and built Azure Data Factory (ADF) ETL/ELT pipelines for ingestion, transformation, and orchestration of data from SQL Server, Blob Storage, and SaaS sources.
Developed ADF pipelines for batch, micro-batch, and real-time data movement integrating on-premises systems, API endpoints, and cloud-native storage layers.
Created ADF Mapping Data Flows for cleansing, schema mapping, type enforcement, and enrichment of datasets consumed by downstream analytics platforms.
Implemented complete pipeline automation using triggers, schedules, dependency chains, and event-based orchestration for zero-manual data movement.
Integrated ADF with Azure Key Vault to automate secret retrieval and securely manage connection strings, keys, passwords, and service credentials.
Built reusable ADF Custom Activities using Python and Azure Functions to handle complex transformations, API calls, and cross-cloud integrations.
Modernized legacy SSIS/ETL jobs by reengineering them into ADF pipelines, improving performance, maintainability, and scaling capabilities.
Developed and optimized Azure Synapse pipelines, data flows, and SQL Pools, improving query performance, parallelism, and distributed data processing.
Established automated DevOps pipelines (CI/CD) using Azure DevOps for deploying ADF artifacts, Synapse scripts, and infrastructure templates to dev/stage/prod.
Migrated build and test environments to Azure using CI/CD automation, reducing deployment effort and improving release consistency across environments.
Implemented automated build validation, dependency checks, schema validation, and artifact versioning using Azure DevOps and Git.
Managed multi-cloud data movement by integrating ADF with AWS S3, Lambda, and Glue, enabling hybrid Azure–AWS ingestion patterns where required.
Designed and deployed containerized ETL utilities using Docker and orchestrated them on Kubernetes (AKS) for scalable processing of high-volume workloads.
Automated Kubernetes deployments using Helm, YAML, and GitOps workflows, enabling reproducible deployments for API services and ETL microservices.
Configured Azure Monitor, Log Analytics, ADF Monitoring, and Application Insights for pipeline observability, alerting, and run-level diagnostics.
Implemented data quality checks and metadata validation using ADF assertions, custom scripts, and automated error-handling logic.
Improved cost efficiency by optimizing pipeline concurrency, integration runtimes, cluster scaling, and scheduling windows across ADF and Synapse.
Troubleshot and resolved source control conflicts in Git/SVN, enabling smooth collaboration across developers, testers, and data engineers.
Built reliable, reproducible build pipelines with automated testing, integration checks, and environment-based release strategies to ensure consistent delivery of data solutions.
Environment:
Azure Data Factory, Azure Synapse, Azure Functions, Azure Monitor, Azure Key Vault, Azure DevOps, PowerShell, ARM Templates, YAML Pipelines, Git, SVN, Docker, Kubernetes (AKS), Helm, GitOps, AWS S3, AWS Lambda, AWS Glue, GCP BigQuery, GCP Cloud Functions, Databricks, PySpark, SQL Server, Blob Storage, Application Insights
Client: AIG NC (North Carolina)
Duration: March 18th 2015 – December 2016
Role: Cloud Operations Engineer
Responsibilities:
Automated provisioning of AWS infrastructure using EC2, S3, IAM, VPC, CloudFormation, and AWS CLI to support scalable application environments.
Built and maintained CI/CD pipelines in Jenkins for automated build, test, packaging, and deployment workflows across multiple stages.
Integrated Jenkins pipelines with AWS CodeDeploy and S3 for artifact versioning, deployment automation, and environment consistency.
Wrote automation scripts using Python and Bash to streamline operational activities, environment setup, and configuration management.
Implemented continuous monitoring and alerting using Amazon CloudWatch to improve observability, log tracking, and system reliability.
Configured and managed AWS networking components such as VPCs, subnets, routing, load balancers, and security groups.
Automated backup, archival, and retention workflows using AWS Backup, S3 lifecycle configurations, and Lambda-based scheduled jobs.
Implemented secure DevOps practices including IAM roles, key rotation, encryption policies, and controlled access to production environments.
Created and maintained Jenkins shared libraries for reusable pipeline logic and standardized CI/CD processes across teams.
Collaborated with engineering teams to improve deployment workflows, troubleshoot pipeline failures, and enhance cloud deployment efficiency.
Environment:
AWS (EC2, VPC, S3, IAM, CloudFormation, CloudWatch, Lambda), Jenkins, Python, Bash, Git, AWS CLI, CodeDeploy, CodeCommit, Load Balancers, S3 Artifact Repositories.