HEMANTH K
SRE / DevOps & Cloud Infrastructure Engineer
Austin, TX +1-631-***-**** *****************@*****.***
PROFESSIONAL SUMMARY
Senior SRE & DevOps Engineer with 4+ years building and operating production infrastructure at scale on AWS and GCP. Owned on-call rotation, incident response, and SLO/SLI tracking maintaining 99.95% uptime while reducing MTTR from 40 to 15 minutes. Architected GitOps-based CI/CD platforms, IaC-driven AWS environments, and Kubernetes-native delivery pipelines serving millions of daily requests. Experienced in HIPAA/compliance-driven environments with proven DevSecOps practices. AWS Certified Solutions Architect + ML Specialty.
CORE TECHNICAL SKILLS
CI/CD & GitOps: Jenkins, ArgoCD, ArgoRollouts, GitHub Actions, FluxCD, Blue-Green & Canary Deployments, Helm, Kustomize
Cloud & IaC: AWS (EC2, EKS, ECS, RDS, S3, IAM, VPC, CloudWatch, Route53, EventBridge, Step Functions), GCP/GKE, Terraform, CloudFormation, Ansible
Containers & Orchestration: Kubernetes (EKS/GKE/OpenShift), Docker, Karpenter, KEDA, Operators/CRDs, Helm, KubeRay
Observability & SRE: Prometheus, Grafana, DataDog, PagerDuty, Splunk, ELK, OpenTelemetry, SLO/SLI/Error Budget Tracking, Runbook Automation
Security & Compliance: HashiCorp Vault, AWS Secrets Manager, SonarQube (SAST), Trivy, Orca, mTLS, Kubernetes Admission Controllers, DevSecOps
Service Mesh & Networking: Cilium eBPF (L4/L7), Envoy Proxy, Istio, gRPC, HTTP/2, xDS APIs
Databases & Storage: PostgreSQL (RDS, Multi-AZ), DynamoDB, Cassandra, MongoDB, ElastiCache Redis, FSx for Lustre, S3
Languages: Python, Bash, Golang, Perl, YAML, Java
PROFESSIONAL EXPERIENCE
Ameriprise Financial Inc. Austin, TX Aug 2024 – Present
SRE / DevOps & Platform Engineer
• Architected production-grade GitOps-based deployment infrastructure with ArgoCD managing 20+ Application CRDs across dev/staging/production clusters — enforced change management discipline through branch-based promotion, automated health assessments, and continuous reconciliation of live state against Git; reduced deployment-related production incidents by 70% through systematic rollout guardrails and automated drift detection with self-healing.
• Engineered progressive delivery patterns using Argo Rollouts: blue-green deployments with dual ReplicaSet traffic switching and canary rollouts with AnalysisTemplate gates integrated with Prometheus SLI metrics (error rate, p99 latency) for automated promotion and rollback — eliminated manual deployment risk and enforced quality gates across all production tiers.
• Owned SRE responsibilities for high-throughput production platform serving millions of daily requests — maintained 99.95% uptime SLO while managing on-call rotation (every 3rd week), incident response, and postmortems; reduced MTTR from 40 to 15 minutes through automated runbooks, comprehensive observability (Prometheus, DataDog, Splunk), and PagerDuty alerting integrated with application-level SLIs (p50/p95/p99 latency, error rate, throughput).
• Built and maintained EKS infrastructure with Karpenter + KEDA autoscaling driven by Prometheus queue-depth metrics — eliminated idle compute spend while maintaining sub-90s cold-start SLO; managed end-to-end cluster reliability including node provisioning, health monitoring, pod scheduling, and distributed system SLOs.
• Led full-stack performance debugging to eliminate tail latency bottlenecks — conducted Linux kernel analysis, container runtime investigation, and distributed tracing; optimized EKS traffic plane by replacing kube-proxy with Cilium eBPF for L4 and Envoy proxy for L7, improving network throughput and reducing p99 latency by 55%.
• Built event-driven EKS workflows using AWS Step Functions and EventBridge with IRSA (IAM Roles for Service Accounts), fully modularized with Terraform — orchestrated scalable event-based workloads on S3 input, reducing event-time and job-launch latency by 30% while maintaining compliance audit trails through GitOps version control.
• Maintained DevSecOps posture across all production workloads — enforced Kubernetes admission controllers (no privileged containers, image signature verification, resource limits), integrated automated SAST scanning (SonarQube) and container vulnerability detection (Orca/Trivy) into CI/CD pipelines, reducing high-severity CVEs by 45% and maintaining compliance audit trails through immutable infrastructure patterns.
• Managed secrets lifecycle with HashiCorp Vault and AWS Secrets Manager — configured Kubernetes auth method with fine-grained IRSA policies, eliminated hard-coded credentials from all CI/CD pipelines, and enforced dynamic secret rotation across all production services in compliance with enterprise security standards.
• Built fault isolation boundaries and tracked error budgets across all production tiers — prevented blast-radius cross-contamination between workloads; drove SLO breach reviews into platform improvement cycles, enforcing accountability and continuous reliability improvement.
Centene Corporation Sunrise, FL Feb 2024 – Jul 2025
DevOps / Cloud Engineer
• Architected multi-region fault-tolerant platform on AWS with Route 53 geolocation routing, DynamoDB Global Tables, DAX caching (sub-10ms reads), API Gateway rate limiting with WAF, ALB zero-downtime deployments, and ECS Fargate auto-scaling — achieved 99.97% uptime validated through chaos engineering.
• Architected highly available Amazon RDS for PostgreSQL with Terraform: Multi-AZ deployment, Point-in-Time Recovery (PITR), and read replicas from standby instances to isolate analytical workloads and improve primary I/O efficiency.
• Integrated DevSecOps into production CI/CD pipelines — automated SAST scanning (SonarQube), container vulnerability detection and cloud security posture management (Orca), reducing high-severity CVEs by 45%; implemented Kubernetes admission controllers enforcing security policies with automated remediation workflows for zero-downtime CVE patching.
• Developed Ansible compliance playbooks to automate security hardening — enforced file permissions, automated patching cycles, prevented configuration drift, and maintained standard configuration objects; reduced security audit findings by 60% across Linux server fleet.
• Developed hybrid infrastructure automation with Terraform and Ansible provisioning infrastructure and application stacks consistently across multi-cloud environments (AWS + GCP) — reduced manual intervention by 70%; maintained Jenkins multi-branch pipeline configurations covering dev-to-prod deployment stages for 15+ microservices.
• Deployed ElastiCache Redis cluster mode with 99.2% cache hit rate, Circuit Breaker patterns via Istio service mesh preventing cascading failures, and mTLS encryption across all service communication — maintained compliance posture across all workloads in a HIPAA-regulated environment.
• Managed SQL and NoSQL solutions — PostgreSQL, Cassandra, and MongoDB across multiple data centers; configured replication topologies for high availability and low-latency reads; collaborated with developers on schema changes and query optimization.
• Designed petabyte-scale data processing infrastructure with Kinesis Data Streams, Lambda, DynamoDB Global Tables with DAX, EMR Spark via Step Functions, and S3 Cross-Region Replication with Glacier lifecycle policies — achieved sub-5min end-to-end latency and reduced storage costs by 40%.
State of Georgia Atlanta, GA Jun 2022 – Aug 2023
DevOps Automation & Cloud Infrastructure Engineer
• Designed fully automated server build, monitoring, and deployment solutions spanning Jenkins CI, Docker containerization, and multi-platform infrastructure (AWS EC2, VMware) — standardized deployment patterns across heterogeneous environments.
• Built event-driven EKS workflows using AWS Step Functions and EventBridge with IRSA, fully modularized with Terraform — orchestrated scalable event-based data processing on S3 input, reducing event-time and job-launch latency by 30%.
• Managed and troubleshot Red Hat OpenShift containerized applications; configured Grafana and Graylog dashboards for application-level and infrastructure-level alerting; maintained centralized logging with structured JSON log pipelines and custom parsing rules.
• Implemented HashiCorp Vault token-based secrets management — configured Kubernetes auth method with fine-grained access policies, eliminated hard-coded credentials from all CI/CD pipelines, and established dynamic secret rotation for all production services.
• Maintained GKE and EKS deployments across multiple regions with centralized S3 storage; managed Cassandra and MongoDB replication across multiple data centers with zero-downtime topology changes.
• Implemented AWS CloudWatch monitoring and alerting; managed S3, IAM policies, AMIs, and snapshots for backup and disaster recovery — maintained operational visibility across all infrastructure tiers.
EDUCATION
University of Texas at Arlington Aug 2023 – 2025
Master of Science, Data Science
CERTIFICATIONS
• AWS Certified Solutions Architect – Associate
• AWS Certified Machine Learning – Specialty
• NVIDIA Certified Professional: Generative AI