DevOps/SRE Engineer - Sujith Kumar Koppara

Location:

San Jose, CA, 95115

Posted:

March 20, 2026

Contact this candidate

Resume:

DevOps Engineer/SRE

Name: Sujith Kumar Koppara

Email: ********************@*****.*** Ph. No: 913-***-****

PROFESSIONAL SUMMARY:

●DevOps / SRE Engineer with 5 + years of experience designing, automating, and scaling highly available, fault-tolerant cloud-native systems across multi-cloud and hybrid environments.

●Proven expertise in building CI/CD ecosystems from scratch, reducing deployment time by 60–80% through automation and standardized release frameworks.

●Strong hands-on experience in AWS, Azure, and containerized microservices architectures using Kubernetes, Docker, and service mesh implementations.

●Architected resilient infrastructure using Infrastructure as Code (Terraform, CloudFormation), ensuring immutable, version-controlled, and repeatable environments.

●Deep experience implementing SRE principles including SLIs, SLOs, error budgets, observability engineering, and incident response automation.

●Designed proactive monitoring and alerting strategies using Prometheus, Grafana, ELK, Datadog, and Splunk to minimize MTTR and production outages.

●Strong background in high-availability design, auto-scaling strategies, disaster recovery, backup orchestration, and multi-region deployments.

●Expertise in DevSecOps practices including vulnerability scanning, secrets management, IAM hardening, and container security enforcement.

●Implemented and managed service mesh architectures to improve microservices communication, observability, and traffic management across Kubernetes-based platforms.

●Hands-on experience working with OSHIP Popcorn platform to support cloud-native application deployment, operational automation, and enterprise DevOps workflows.

●Built scalable CI/CD and platform engineering solutions integrating service mesh capabilities for secure service-to-service communication and enhanced reliability.

●Experience optimizing cloud cost using FinOps strategies, rightsizing, reserved instances planning, and workload re- architecture.

●Automated configuration management and environment provisioning using Ansible, Chef, and shell scripting, eliminating manual operational overhead.

●Built centralized logging, tracing, and metrics platforms improving system visibility and reducing incident resolution time by over 40%.

●Led root cause analysis (RCA), postmortems, and reliability improvements to strengthen production stability.

●Strong collaboration with developers, QA, security, and product teams to embed DevOps culture and reliability-first engineering practices.

●Experienced in zero-downtime deployments, blue-green and canary release strategies across large-scale enterprise applications.

●Adept at handling high-traffic production workloads, ensuring 99.9%+ uptime in mission-critical systems.

TECHNICAL SKILLS:

Cloud Platforms: AWS (EC2, EKS, ECS, S3, RDS, Lambda, IAM, CloudFront, VPC), Azure (AKS, ACR, DevOps, App Services), GCP (GKE – exposure)

Containerization & Orchestration: Docker, Kubernetes, Helm, Kustomize, OpenShift

Infrastructure as Code: Terraform, AWS CloudFormation, ARM Templates

CI/CD Tools: Jenkins, GitHub Actions, GitLab CI, Azure DevOps, ArgoCD

Monitoring & Observability: Prometheus, Grafana, ELK Stack, Splunk, Datadog, CloudWatch, New Relic

Configuration Management: Ansible, Chef

Scripting & Automation: Bash, Shell, Python, YAML, Groovy

Version Control: Git, GitHub, Bitbucket

Security & DevSecOps: SonarQube, Trivy, Snyk, OWASP practices, HashiCorp Vault Databases & Middleware: MySQL, PostgreSQL, MongoDB, Redis, Nginx, Apache, Tomcat Operating Systems: Linux (RHEL, Ubuntu, Amazon Linux), Windows Server

PROFESSIONAL EXPERIENCE:

Client: Cigna Healthcare, CT Jan 2025 – Till Date Role: DevOps/ SRE Engineer

Responsibilities:

●Architected and managed production-grade Amazon EKS clusters supporting 60+ microservices with auto-scaling groups, multi-AZ high availability, and zero-downtime deployment strategies across staging and production environments.

●Designed and enforced SRE practices by defining SLIs, SLOs, and error budgets, aligning system reliability targets with business objectives and reducing Sev-1 incidents by implementing proactive reliability engineering strategies.

●Built fully automated CI/CD pipelines using Jenkins and GitHub Actions integrating static code analysis (SonarQube), container image scanning (Trivy), and automated Helm-based Kubernetes deployments.

●Implemented GitOps-based continuous deployment using ArgoCD, enabling controlled rollouts, version tracking, rollback strategies, and improved deployment auditability.

●Developed reusable and modular Terraform infrastructure stacks for VPC, EKS, RDS, IAM, and autoscaling configurations, reducing environment provisioning time from days to under one hour.

●Engineered centralized observability platform using Prometheus, Grafana, and ELK Stack, implementing golden signal monitoring (latency, traffic, errors, saturation) across services.

●Automated production incident remediation through scripted runbooks and alert-triggered Lambda functions, reducing MTTR by over 40%.

●Designed and implemented service mesh solutions to manage microservices communication, enabling traffic routing, resilience, and enhanced observability across containerized environments.

●Configured service mesh policies including traffic splitting, retries, circuit breaking, and security policies to improve application reliability and performance.

●Integrated service mesh with Kubernetes clusters to streamline service discovery, load balancing, and secure communication between distributed services.

●Utilized OSHIP Popcorn platform to deploy and manage cloud-native applications while ensuring standardized DevOps processes and operational consistency.

●Led post-incident root cause analysis sessions, implemented preventive controls, and introduced reliability scorecards to track service health.

●Integrated service mesh observability tools with centralized monitoring systems to track service latency, error rates, and request flows.

●Designed cross-region disaster recovery architecture using automated snapshot backups, multi-region S3 replication, and RDS failover strategies.

●Hardened IAM roles and Kubernetes RBAC policies following least-privilege access principles and implemented secrets management using HashiCorp Vault.

●Optimized AWS cost footprint by rightsizing EC2 workloads, enabling autoscaling policies, implementing lifecycle rules on S3, and reviewing idle resource consumption monthly.

●Partnered closely with application teams to containerize legacy workloads and improve deployment velocity without sacrificing stability.

Environment: AWS (EKS, EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, Auto Scaling), Terraform, Docker, Kubernetes, Helm, ArgoCD, Jenkins, GitHub Actions, Prometheus, Grafana, ELK Stack, HashiCorp Vault, SonarQube, Trivy, Linux, Bash, Python.

Client: WellsFargo, Hyderabad, India Nov 2023 – Jul 2024 Role: DevOps/ SRE Engineer

Responsibilities:

●Managed hybrid cloud infrastructure across Azure and AWS environments supporting application intelligence and code analysis platforms serving global enterprise customers.

●Designed Azure DevOps CI/CD pipelines for multi-branch workflows with gated approvals, artifact versioning, and automated deployment to AKS clusters.

●Containerized monolithic applications into Docker-based microservices and deployed into Azure Kubernetes Service (AKS) with Helm-based templating and environment segregation.

●Implemented proactive monitoring strategies using Datadog and Prometheus to track infrastructure performance, JVM metrics, container resource utilization, and application-level KPIs.

●Defined SLO-driven alerting policies that reduced alert fatigue and improved signal-to-noise ratio across production environments.

●Supported platform engineering initiatives by configuring monitoring, logging, and observability features for applications running through service mesh environments.

●Collaborated with development teams to containerize applications and enable seamless deployment through OSHIP Popcorn-managed infrastructure.

●Maintained platform stability by troubleshooting service mesh networking issues, optimizing configurations, and ensuring high availability of microservices platforms.

●Built Infrastructure as Code modules using Terraform for provisioning AKS clusters, Azure networking components, load balancers, and private endpoints.

●Introduced automated vulnerability scanning for container images and dependencies using Snyk and integrated security gates within CI/CD workflows.

●Improved system reliability by tuning resource limits/requests in Kubernetes, implementing Horizontal Pod Autoscalers, and optimizing node pool configurations.

●Automated configuration drift detection and remediation across multiple environments using Ansible playbooks and policy validation scripts.

●Implemented log aggregation pipelines with Azure Monitor and ELK for deep production diagnostics and performance troubleshooting.

●Participated in on-call rotations, handled Sev-1 and Sev-2 incidents, and executed controlled rollback procedures during critical outages.

●Collaborated with development and architecture teams to improve application observability by embedding distributed tracing and structured logging standards.

Environment: Azure (AKS, ACR, Azure DevOps, Azure Monitor, VNets, Load Balancers), AWS (EC2, S3), Terraform, Docker, Kubernetes, Helm, Datadog, Prometheus, Snyk, Jenkins, Ansible, ELK Stack, Git, Linux, Bash, Python.

Client: TCS, Hyderabad, India Dec 2021 – Oct 2023

Role: DevOps Engineer Responsibilities:

●Built and maintained CI/CD pipelines using Jenkins for Java and Node.js applications, integrating automated testing, artifact management, and multi-environment deployments.

●Designed AWS infrastructure architecture including VPC design, subnet segmentation, security groups, NAT gateways, and IAM policies for secure production workloads.

●Containerized legacy applications using Docker and deployed to Kubernetes clusters, enabling scalable microservices- based architecture.

●Implemented Infrastructure as Code using Terraform to standardize infrastructure provisioning across development, QA, and production environments.

●Configured and optimized Nginx and Apache web servers for reverse proxy, SSL termination, and load balancing.

●Established centralized logging solution using ELK Stack, enabling efficient log indexing, search, and root cause identification.

●Automated server provisioning, patching, and configuration management using Ansible playbooks, eliminating manual environment inconsistencies.

●Implemented monitoring dashboards using Grafana and AWS CloudWatch to track CPU, memory, disk utilization, and application health metrics.

●Reduced deployment failures by implementing automated rollback strategies and blue-green deployment techniques.

●Supported database deployments and backups for MySQL and PostgreSQL instances running in AWS RDS environments.

●Improved release cycle efficiency by introducing Git branching strategies, pull request approvals, and version tagging standards.

●Assisted in migrating on-premises workloads to AWS cloud infrastructure with minimal downtime and controlled cutover strategy.

Environment: AWS (EC2, S3, RDS, IAM, VPC, CloudWatch), Jenkins, Docker, Kubernetes, Terraform, Ansible, ELK Stack, Grafana, Git, Linux (RHEL, Ubuntu), Nginx, Apache, MySQL, PostgreSQL.

Client: E-Centric solutions Pvt Ltd, Hyderabad, India Aug 2020-Nov 2021 Role: Build & Release Engineer

Responsibilities:

●Created the automated build and deployment process for application, re-engineering set up for better user experience, and leading up to building a continuous integration system for all our products.

●Developed and automated build & deployment pipelines using Jenkins, Gradle, ANT, and Maven, optimizing release cycles and ensuring seamless application delivery.

●Implemented Docker-based containerization and deployed Microservices on Amazon EKS, enhancing scalability and performance.

●Automated provisioning and lifecycle management for Ubuntu Linux on AWS EC2 using Chef, Ruby, and Bash scripts.

●Configured and optimized AWS services including EC2, S3, RDS, EBS, Auto Scaling, and Load Balancers for improved application availability.

●Configured Datadog Synthetic Monitoring alongside ELK Stack for proactive issue resolution and real-time observability.

●Integrated Snyk and Aqua Security within Gradle tasks to enforce security scans, ensuring compliance and vulnerability management.

●Developed Perl & Shell scripts for automation of build and release processes, improving deployment efficiency.

●Utilized Puppet manifests and SaltStack for automated scaling and configuration management in Docker Swarm & LXC/LXD environments.

●Worked as a Technical Design team member for the Build & Release Module, contributing to efficient software deployment strategies.

●Enhanced CI/CD workflows by integrating New Relic alerts with Slack, enabling real-time monitoring and quick incident response.

Environment: Maven, Docker, AWS, Chef, Jenkins, Apache Webserver, Apache JMETER, Python, Perl, Shell.

Education Details:

Masters from Avila university (Computer/ Information Technology, Administration and Management)- 2025

Bachelors from Vignan’s Institute of Information Technology (Electrical and Electronics Engineering) -2021

Contact this candidate