Post Job Free
Sign in

Senior Cloud Platform

Location:
West Chester, PA
Posted:
May 18, 2026

Contact this candidate

Resume:

Kiran Peddi

SENIOR DEVOPS ENGINEER AIOPS DEVOPS SECOPS MLOPS DISTRIBUTED CLOUD SYSTEMS M: +1-929-***-**** E: ***********@*****.*** L: www.linkedin.com/in/kiranp02 Visa: Green Card (No Sponsorship Required) Public Trust Eligible

PROFESSIONAL SUMMARY

●Senior DevOps, SRE, and Platform Engineering specialist with expertise in Kubernetes platform engineering, multi-cloud infrastructure automation, GitOps delivery systems, distributed systems reliability, and enterprise-scale cloud-native operations.

●Senior DevOps, SRE, and Platform Engineering specialist with expertise in Kubernetes platform engineering, multi-cloud infrastructure automation, GitOps delivery systems, distributed systems reliability, and enterprise-scale cloud-native operations.

●Engineered full-spectrum Ops ecosystems spanning DevOps, DataOps, MLOps, AIOps, SecOps, ModelOps, and ITOps across multi-cloud Kubernetes platforms.

●Architected and operated multi-cloud infrastructure across AWS, Azure, and GCP supporting AI/ML platforms, healthcare systems, financial services, SaaS applications, and enterprise distributed workloads.

●Specialized in Kubernetes platform engineering across EKS, AKS, and GKE including cluster lifecycle management, workload isolation, autoscaling, node lifecycle automation, GPU orchestration, and production reliability engineering.

●Designed Infrastructure-as-Code frameworks using Terraform, OpenTofu, Terragrunt, CloudFormation, and Ansible enabling reusable provisioning, immutable infrastructure, drift remediation, and multi-account governance.

●Built enterprise CI/CD and GitOps delivery ecosystems using GitHub Actions, Jenkins, GitLab CI/CD, Helm, Flux CD, and ArgoCD supporting secure release pipelines, artifact governance, rollback automation, and blue/green deployment strategies.

●Implemented SRE reliability engineering practices including SLO/SLI governance, error budgets, incident response, chaos engineering, capacity planning, disaster recovery testing, observability-driven operations, and distributed systems optimization.

●Engineered AIOps observability platforms using Prometheus, Grafana, OpenTelemetry, Datadog, ELK, Splunk, CloudWatch, and OpenSearch enabling intelligent alert correlation, anomaly detection, telemetry-driven RCA, and production stability engineering.

●Built GPU-enabled Kubernetes and AWS Batch platforms supporting AI inference, LLM observability, model telemetry, MLOps pipelines, asynchronous orchestration, and distributed compute workloads.

●Hardened cloud-native platforms using IAM least-privilege enforcement, OPA Gatekeeper, Kyverno, Vault, SBOM generation, Cosign image signing, secrets management, runtime security, and Zero Trust governance frameworks.

●Experienced in Python, Bash, Go, and TypeScript automation for backend services, infrastructure SDKs, cloud operations, event-driven systems, and platform engineering enablement.

TECHNICAL SKILLS

Category

Tools / Technologies

Cloud Platforms

AWS (EC2, EKS, ECS, Lambda, IAM, VPC, S3, RDS, CloudWatch, Route53, KMS, Transit Gateway), Azure (AKS, Azure AD, Azure Monitor, Functions), GCP (GKE, Compute Engine, Cloud Monitoring, IAM), OpenStack

Kubernetes & Containers

Kubernetes (EKS, AKS, GKE), Docker, Helm, ECS, Kubernetes RBAC, Admission Controllers, HPA/VPA, Cluster Autoscaler, Stateful Workloads, Multi-tenant Clusters, Service Mesh Concepts (Istio/Linkerd), Network Policies

Infrastructure as Code

Terraform, OpenTofu, Terragrunt, CloudFormation, ARM Templates, Deployment Manager, Ansible, Terraform Modules, Terratest Concepts, Drift Detection, Immutable Infrastructure

CI/CD & GitOps

GitHub Actions, Jenkins, GitLab CI/CD, ArgoCD, Flux CD, Azure DevOps, Blue/Green Deployments, Release Engineering, Rollback Automation, Artifact Promotion Pipelines

Programming & Scripting

Python, Bash, PowerShell, Go, TypeScript, Groovy, YAML, JSON

Monitoring & Observability

Prometheus, Grafana, Datadog, ELK Stack, Splunk, OpenTelemetry, CloudWatch, OpenSearch, Dynatrace, AIOps, Intelligent Alert Correlation, Distributed Tracing, Telemetry Pipelines

DevSecOps & Security

IAM, Azure AD, GCP IAM, Vault, OPA Gatekeeper, Kyverno, SBOM, Cosign, SAST, SCA, Secrets Management, Runtime Security, IaC Security Scanning, Zero Trust, Compliance Automation, FedRAMP, SOC2, HIPAA

AI/ML Infrastructure

GPU Kubernetes Workloads, AWS Batch, LLM Observability, Model Serving, MLOps, Inference Telemetry, Token Tracking, AI Anomaly Detection, Agentic AI Observability

Networking

TCP/IP, DNS, SSL/TLS, VPN, VPC, Transit Gateway, BGP, VXLAN, Hybrid Connectivity, Network Segmentation, Zero Trust Networking

Databases

PostgreSQL, MySQL, DynamoDB, MongoDB, Cassandra, SQL Server, Oracle

Operating Systems

Linux (RHEL, Ubuntu, CentOS), UNIX, Windows Server, macOS

Configuration Management

Ansible, Puppet, Chef, SaltStack

Container Registries

Amazon ECR, Docker Hub, Google Container Registry (GCR), Azure Container Registry (ACR), JFrog Artifactory

Collaboration & ITSM

Jira, Confluence, PagerDuty, Slack, Microsoft Teams

PROFESSIONAL EXPERIENCE

Senior Platform Engineer Mayo Clinic – Rochester, MN May 2023 – Present

●Owned SLOs, SLIs, and error budgets for 40+ production services running on AWS EKS across multi-region healthcare environments.

●Architected GPU-enabled Kubernetes infrastructure supporting AI-driven clinical imaging and MLOps inference workloads.

●Built LLM observability frameworks tracking latency, token usage, inference throughput, and agent completion telemetry.

●Designed active-active multi-region AWS disaster recovery architecture with automated failover testing and operational runbooks.

●Implemented AIOps observability pipelines using Prometheus, Grafana, OpenTelemetry, Datadog, CloudWatch, and OpenSearch.

●Standardized Terraform, OpenTofu, and Terragrunt modules for multi-account AWS infrastructure provisioning and governance.

●Automated GitOps delivery pipelines using GitHub Actions, Jenkins, Helm, and ArgoCD for Kubernetes deployment automation.

●Engineered event-driven distributed systems using SQS, SNS, Lambda, DynamoDB, and asynchronous workflow orchestration.

●Built Internal Developer Platform (IDP) using Backstage enabling self-service provisioning and Golden Path deployment workflows.

●Enforced DevSecOps controls using OPA Gatekeeper, IAM least-privilege policies, SBOM generation, Cosign signing, and secrets management.

●Developed Python and TypeScript automation services supporting distributed healthcare imaging and backend orchestration systems.

●Implemented Kubernetes autoscaling, workload isolation, node lifecycle automation, and runtime policy enforcement across EKS clusters.

●Led incident response, root cause analysis, and telemetry-driven production reliability engineering for mission-critical systems.

●Built chaos engineering and failure injection frameworks validating resilience and graceful degradation across distributed workloads.

●Managed PostgreSQL and DynamoDB-backed distributed systems requiring high availability, recovery automation, and operational stability.

Environment: AWS, Azure, GCP, Kubernetes, EKS, Docker, Terraform, OpenTofu, Terragrunt, Helm, GitHub Actions, ArgoCD, Jenkins, GitLab CI, Python, TypeScript, Prometheus, Grafana, Datadog, OpenTelemetry, CloudWatch, OpenSearch, PostgreSQL, DynamoDB

Cloud Infrastructure Engineer Clario – Philadelphia, PA September 2021 – May 2023

●Architected and operated multi-region AWS infrastructure supporting regulated SaaS and healthcare workloads.

●Built Kubernetes-based application platforms on EKS with Helm deployment standardization and workload orchestration.

●Developed Terraform module libraries implementing reusable Infrastructure-as-Code and GitOps governance workflows.

●Implemented Datadog observability platforms including distributed tracing, telemetry pipelines, SLO dashboards, and centralized alerting.

●Automated CI/CD delivery pipelines using Jenkins, GitLab CI, GitHub Actions, and container release automation.

●Designed blue/green deployment workflows enabling controlled environment promotion and rollback strategies.

●Hardened cloud security posture using IAM least-privilege access, Vault secrets management, and policy validation controls.

●Implemented container vulnerability scanning, artifact governance, and secure image promotion pipelines.

●Built Prometheus and Grafana monitoring systems supporting production telemetry aggregation and proactive alerting.

●Engineered asynchronous distributed systems using SQS, SNS, Lambda, and event-driven backend orchestration patterns.

●Supported GPU-backed AWS compute environments for AI model serving and ML inference reliability engineering.

●Managed PostgreSQL and DynamoDB persistence layers supporting distributed healthcare backend services.

●Eliminated infrastructure drift using Terraform state validation, automated compliance checks, and GitOps enforcement.

●Led production incident response, disaster recovery testing, and structured post-incident root cause analysis activities.

●Resolved high-severity cloud networking, Kubernetes runtime, and distributed queue-processing failures across production systems.

Environment: AWS, EKS, Terraform, OpenTofu, Jenkins, GitLab CI, GitHub Actions, Docker, Helm, Prometheus, Grafana, Datadog, Vault, Lambda, PostgreSQL, DynamoDB, Linux, Python

Cloud Infrastructure Engineer Comcast – West Chester, PA February 2018 – August 2021

●Designed Kubernetes-based container platforms on AWS EKS supporting enterprise microservices and distributed applications.

●Architected multi-region AWS networking using Transit Gateway, VPC segmentation, centralized egress, and hybrid connectivity.

●Built Terraform Infrastructure-as-Code frameworks with module versioning, validation pipelines, and GitOps workflows.

●Implemented Kubernetes RBAC, namespace isolation, and network policies enforcing workload security boundaries.

●Engineered CI/CD pipelines using Jenkins, GitHub Actions, Docker, and Helm for automated deployment workflows.

●Developed multi-cloud infrastructure solutions across AWS, Azure, and GCP supporting enterprise-scale distributed systems.

●Implemented Prometheus, Grafana, ELK, and Splunk observability platforms for centralized telemetry and monitoring.

●Automated cloud provisioning and operational workflows using Python, Bash, Terraform, and Ansible.

●Built disaster recovery and failover architectures improving resilience for mission-critical production services.

●Hardened security posture using IAM governance, Kubernetes RBAC, runtime access controls, and network segmentation.

●Managed PostgreSQL-backed stateful services deployed across Kubernetes runtime environments.

●Optimized containerized deployment strategies improving workload portability, scalability, and operational consistency.

●Eliminated configuration drift through infrastructure validation, governance automation, and standardized provisioning.

●Supported distributed asynchronous systems and resilient workload orchestration across cloud-native platforms.

●Troubleshot Kubernetes scheduling, ingress networking, distributed services, and production infrastructure failures.

Environment: AWS, Azure, GCP, EKS, Docker, Kubernetes, Terraform, Ansible, Jenkins, GitHub Actions, Helm, ELK Stack, Splunk, Prometheus, Grafana, PostgreSQL, Linux, Python

DevOps Engineer / AWS Engineer JPMorgan Chase – Westerville, OH May 2016 – January 2018

●Engineered AWS infrastructure supporting distributed banking and financial transaction systems.

●Built Infrastructure-as-Code automation using Terraform, CloudFormation, and Ansible for repeatable cloud provisioning.

●Developed CI/CD pipelines using Jenkins, GitHub, and AWS CodeDeploy for automated application deployment.

●Implemented Docker-based containerization workflows supporting microservices modernization initiatives.

●Built Kubernetes orchestration and deployment automation for scalable enterprise application workloads.

●Standardized AWS VPC layouts, IAM structures, security groups, and infrastructure governance across environments.

●Implemented CloudWatch monitoring, SNS alerting, and telemetry-driven operational visibility for production systems.

●Developed Python and Bash operational tooling supporting infrastructure diagnostics and deployment automation.

●Hardened cloud environments using IAM least-privilege access, encrypted workloads, and policy-driven governance.

●Automated release workflows with rollback controls, artifact versioning, and deployment validation pipelines.

●Supported PostgreSQL database infrastructure for transactional enterprise applications.

●Improved cloud networking reliability using VPC segmentation and secure connectivity patterns.

●Established monitoring and logging systems supporting incident response and production operations.

●Investigated distributed transaction failures, asynchronous processing delays, and infrastructure resource bottlenecks.

●Supported production reliability engineering and operational support for enterprise banking platforms.

Environment: AWS, Terraform, CloudFormation, Docker, Kubernetes, Jenkins, GitHub, CodeDeploy, CloudWatch, PostgreSQL, Linux, Python, Bash, Ansible

IT Infrastructure Specialist / Build & Release Engineer FedEx – Orlando, FL September 2015 – April 2016

●Administered Linux and Windows infrastructure supporting enterprise logistics and operational systems.

●Automated infrastructure administration workflows using Bash scripting and configuration management tooling.

●Implemented centralized monitoring and alerting systems using Nagios for production infrastructure health validation.

●Supported hybrid infrastructure environments spanning datacenter and early cloud adoption systems.

●Automated build and release engineering workflows using Jenkins and deployment automation pipelines.

●Supported AWS EC2 and VPC provisioning enabling scalable cloud infrastructure adoption.

●Managed Linux production systems supporting enterprise operational workloads and backend services.

●Developed deployment validation pipelines improving release consistency and rollback capabilities.

●Assisted migration initiatives transitioning legacy operational systems toward cloud-native infrastructure patterns.

●Supported monitoring, logging, and telemetry aggregation pipelines for operational visibility.

●Assisted with infrastructure hardening, patch management, and access governance activities.

●Participated in incident response and troubleshooting for networking, compute, and runtime infrastructure failures.

●Supported enterprise database operations and operational reporting environments.

●Improved deployment reliability through standardized automation scripts and release workflows.

●Maintained operational continuity across distributed enterprise systems requiring high availability.

Environment: Linux, Windows Server, Jenkins, AWS EC2, VPC, Bash, PowerShell, Puppet, SVN, TFS, WebSphere

Jr. Linux / UNIX Administrator & Build Engineer Genpact – Bangalore, India July 2013 – August 2014

●Administered Linux and UNIX server environments supporting enterprise application workloads.

●Performed system configuration, patch management, capacity monitoring, and operational maintenance activities.

●Automated repetitive administration tasks using Bash shell scripting and operational tooling.

●Supported Jenkins-based build and deployment automation workflows for enterprise applications.

●Maintained application deployment environments using SVN and Git version control systems.

●Supported infrastructure provisioning and server lifecycle management for distributed enterprise systems.

●Assisted with production monitoring, log analysis, and operational telemetry validation.

●Supported SQL database operations and transactional application environments.

●Participated in troubleshooting infrastructure instability, application outages, and networking failures.

●Implemented operational monitoring and alert automation supporting proactive health validation.

●Assisted infrastructure upgrade initiatives involving virtualization and backend application migration activities.

●Managed Linux runtime environments supporting internal business applications and backend operational systems.

●Supported release engineering validation processes across development and testing environments.

●Improved operational efficiency through automation of recurring infrastructure workflows.

●Enforced standardized deployment, configuration, and runtime governance procedures.

Environment: Linux, RHEL, VMware, Tomcat, Apache, MySQL, Jenkins, Git, SVN, Shell Scripting

EDUCATION

Master’s in electrical and Electronics Engineering August 2014 to December2015

University of Hartford, West Hartford, CT, U.S.

Bachelor of Science in Electronics and communication Engineering June 2009 to May2013.

Jawaharlal Nehru Technological University, India.



Contact this candidate