Post Job Free
Sign in

Senior DevOps/SRE Engineer with Cloud Expertise

Location:
Hyderabad, Telangana, India
Posted:
January 13, 2026

Contact this candidate

Resume:

Srikanth Bommadi

DevOps/ Site Reliability Engineer

+1-302-***-**** ****************@*****.*** https://github.com/SrikanthBommadi

PROFESSIONAL SUMMARY:

Overall 5+ years of experience in automating, deploying, and operating highly available systems across Cloud Computing, SRE, DevOps, Systems Administration, and SCM, supporting both data and application platforms.

Strong experience with SCM and Agile delivery tools including Git, Bitbucket, SVN, and Jira, supporting branching strategies, release management, and coordinated production deployments across Linux/UNIX and Windows environments.

Extensive hands-on experience with AWS cloud services including EKS, ECS, EC2, VPC, IAM, S3, RDS, Lambda, SQS, Auto Scaling, ELB, Route53, CloudWatch, CloudTrail, and CloudFormation, designing scalable and fault-tolerant architectures.

Practical experience with Azure infrastructure and security services such as VMs, AKS, VNet, Azure SQL, Load Balancer, Storage Accounts, Azure AD, and RBAC, ensuring secure and reliable cloud operations.

Strong expertise in Infrastructure as Code using Terraform, AWS CloudFormation, and AWS CDK (Python), building reusable modules and enforcing security guardrails and operational standards.

Hands-on experience developing Python-based AWS Lambda functions for automation, event-driven workflows, and operational tooling, leveraging libraries such as boto3 for AWS service integration, requests for API interactions, pandas and NumPy for data processing, and PyYAML for configuration handling to improve reliability and reduce manual effort.

Proven experience in Docker containerization and Kubernetes orchestration (Amazon EKS, OpenShift), including cluster provisioning, upgrades, networking, security, and high availability for microservices workloads, with centralized cluster management using Rancher.

Implemented enterprise CI/CD and release strategies using Jenkins, ArgoCD, and Spinnaker, enabling blue-green and zero-downtime deployments with automated rollback and controlled canary releases.

Applied DevSecOps practices by implementing SAST, DAST, dependency, container, Infrastructure as Code (IaC), and compliance scans within CI/CD pipelines using tools such as SonarQube, Veracode, and Trivy to ensure secure, reliable, and production-ready cloud deployments.

Applied SRE best practices by defining SLIs, SLOs, and SLAs, implementing observability using Prometheus, ELK, Dynatrace, Splunk, and CloudWatch, performing RCA, and collaborating closely with cross-functional teams in production environments.

Supported reliability and availability improvements by defining and monitoring RPO, RTO, MTTR, and MTTF metrics, supporting disaster recovery planning, incident response, and continuous service reliability across production environments.

EDUCATION:

Master of Science in Information Systems from Wilmington University, Wilmington, DE. Aug 2022 - June 2024

Bachelor of Technology In Computer Science from JNTUH, Hyderabad, India. May 2016 - March 2021

CERTIFICATIONS:

AWS Certified Cloud Practitioner

AWS Solutions Architect Associate

HashiCorp Certified: Terraform Associate

Certified Kubernetes Engineer (CKE)

TECHNICAL SKILLS:

CI/CD Tools

Jenkins, GitLab CI, CircleCI, ArgoCD

Cloud Platforms

AWS, Azure, GCP

Programming

Python, Java, Shell, Go

Containerization

Docker, Kubernetes, OpenShift, AKS, EKS

IaC Tools

Terraform, CloudFormation, Ansible

Database

Oracle, MySQL, SQL Server, PostgreSQL, MongoDB

Containers

Docker, Kubernetes, Helm

Scripting

Bash, Python, Groovy, Go

Monitoring

Prometheus, Grafana, ELK Stack, Datadog, CloudWatch

Source Control

Git, GitHub, Bitbucket

OS & Servers

Linux (Ubuntu/CentOS), Nginx, Apache

Versioning

Semantic Versioning, Git Flow

PROFESSIONAL EXPERIENCE:

Air Canada – Philadelphia, PA July 2024 to Present

DevOps Engineer

Responsibilities:

Facilitated production releases while supporting incident management and advanced operational support for AWS-hosted applications ensuring platform stability.

Developed Shell scripts and Python automation for AWS Lambda workflows while implementing automation to improve scalability and reduce operational overhead.

Digitized configuration management and automated patching using Ansible while supporting root cause analysis and preventive remediation for recurring infrastructure issues.

Managed AWS infrastructure using Terraform and CloudFormation, including EC2, VPC, S3, IAM, ensuring consistent and reliable cloud environments.

Containerized applications using Docker and deployed workloads on EKS/Kubernetes, optimizing availability and recovery for production systems.

Engineered Helm charts and Kubernetes Operators to standardize deployments and support high-availability Kubernetes environments on EKS.

Implemented Kafka messaging and Redis caching to improve application reliability and fault tolerance during high-load events.

Built Jenkins CI/CD pipelines for Java, Python, and Go services enabling reliable application delivery on AWS platforms.

Leveraged Istio for traffic control and mTLS to improve resilience, observability, and secure service communication.

Used ArgoCD GitOps to ensure drift detection and automated remediation supporting stable AWS infrastructure states.

Integrated ELK Stack for centralized logging and incident investigation with faster root cause identification.

Configured Prometheus, Grafana, and CloudWatch with alerting to support event monitoring, incident response, and reduced MTTR.

Environment:

Git, Unix/Linux, Python, Groovy, SQL Docker, Kubernetes, OpenShift, Helm Jenkins, Maven, Nexus, JFrog, ArgoCD Kafka, Redis Ansible, Terraform, CloudFormation Prometheus, Grafana, ELK, Splunk AWS (EC2, EBS, S3, VPC, RDS, ELB, Auto Scaling, CloudWatch, SNS, ElastiCache)

eHealth - Dallas, Texas Nov 2022 to June 2024

Site Reliability Engineer

Responsibilities:

Automated HIPAA-compliant CI/CD pipelines using Jenkins, Trivy, Aqua Security, and SonarQube, ensuring continuous PHI protection and preventing regulatory drift across environments.

Architected and designed high-availability, low-latency AWS infrastructure and Amazon EKS clusters for telemedicine and remote patient monitoring platforms, enabling real-time vitals streaming and 24/7 clinical care delivery.

Deployed and managed containerized healthcare workloads using Docker and Helm, standardizing environments for biometric data processing, diagnostics, and clinical trial applications.

Built and maintained disaster-recovery automation using Terraform and Ansible, supporting near-zero RTO/RPO and ensuring data integrity for electronic health records (EHRs).

Integrated Prometheus, Grafana, and the ELK Stack for compliance-driven observability and centralized logging, improving SLA adherence and reducing incident resolution time by 30%.

Streamlined analytics and reporting dashboard deployments through Jenkins CI pipelines and ArgoCD GitOps, reducing release cycles by 35% while maintaining full audit visibility.

Worked on AWS IAM, KMS, Secrets Manager, and encryption standards to secure PHI data and meet healthcare compliance requirements.

Supported production Amazon EKS operations, including scaling, cluster upgrades, security hardening, and workload reliability for critical healthcare services.

Utilized Rancher for centralized Kubernetes cluster management, Spinnaker for continuous delivery and controlled rollouts, and Jira Service Management for incident tracking, change management, and coordinated on-call response workflows across healthcare platforms.

Worked on Auto Scaling, Application and Network Load Balancers, and Multi-AZ deployments to handle variable telemedicine traffic and ensure high availability.

Supported blue-green and canary deployments to minimize risk and downtime during healthcare application releases.

Led cost optimization initiatives using tagging strategies, autoscaling, and usage analysis while maintaining performance for patient-facing systems.

Supported disaster-recovery drills and failover testing to validate business continuity for critical patient-care platforms.

Environment:

AWS, Amazon EKS, Kubernetes, Docker, Helm, Rancher, Spinnaker, Terraform, Ansible, Jenkins, ArgoCD, GitOps, Linux (RedHat 7/8), Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), AWS IAM, KMS, Secrets Manager, Auto Scaling, Application & Network Load Balancers, Git, Jira Service Management

Heeddata - Hyderabad, India April 2020 to March 2022

Cloud Engineer

Responsibilities:

Facilitated Agile cloud projects, ensuring scalable architecture, continuous optimization, and stable production operations through automation and best practices.

Designed and maintained resilient AWS environments using Infrastructure as Code (IaC) with Terraform, enabling high availability and performance across compute, networking, and security layers.

Strengthened security posture by implementing secure credential management using AWS Secrets Manager and SSM Parameter Store, aligning with audit and compliance standards.

Employed Terraform and Git-based IaC workflows to enable version control, team collaboration, and safe infrastructure rollbacks across environments.

Crafted and deployed fault-tolerant AWS infrastructures (EC2, S3, VPC, RDS), supporting high availability and scalable application workloads.

Implemented AWS Identity and Access Management (IAM) policies across 50+ cloud resources, enforcing least-privilege access and reducing potential security exposure by approximately 60%.

Orchestrated containerized workloads using Amazon EKS (Kubernetes) and Amazon ECS (Docker), ensuring efficient scaling, high availability, and optimized resource utilization.

Environment:

AWS, Terraform, Git, Jenkins, Maven, Unix/Linux, Ansible, Docker, Kubernetes (EKS), Amazon ECS, AWS IAM, EC2, S3, VPC, RDS, Load Balancers, Auto Scaling, AWS Secrets Manager, SSM Parameter Store.



Contact this candidate