Post Job Free
Sign in

Senior SRE/DevOps Engineer with Cloud & Kubernetes Expertise

Location:
Redmond, WA, 98052
Posted:
November 26, 2025

Contact this candidate

Resume:

NAME: Rahul Bitra (Sr SRE DevOps Engineer)

Phone: 206-***-****

Email: *************@*****.***

PROFESSIONAL SUMMARY

Results-driven Senior SRE/DevOps Engineer with 7+ years of experience designing, automating, and maintaining scalable infrastructure across AWS, Azure, and GCP. Skilled in Kubernetes, OpenShift 4.x, Terraform, Ansible, and Jenkins to build and deploy resilient CI/CD pipelines. Experienced in infrastructure-as-code, GitOps (ArgoCD/FluxCD), and implementing SRE best practices including SLIs, SLOs, SLAs, and error budgets. Proficient in integrating observability and security tools such as Prometheus, Grafana, Dynatrace, and Splunk to enhance system reliability and performance. Strong focus on automation, cloud migration, and operational excellence through continuous improvement.

Proficient in OpenShift 4.x and Kubernetes for orchestrating scalable, resilient container environments in both on-prem and cloud (AWS/Azure) MuleSoft,s.

Skilled in infrastructure automation using Ansible, Terraform, ARM templates and shell/Python scripting to deploy, configure, and monitor Linux-based environments.

Proficient in developing and maintaining CI/CD pipelines across mixed technology stacks including ARM Templates, COBOL, Java, and TypeScript applications..

Hands-on in integrating monitoring tools like Dynatrace, Prometheus, and Splunk to enhance observability, reduce MTTR, and improve SRE metrics (SLIs/SLOs/SLAs).

Experience managing production workloads on Amazon EKS, implementing autoscaling, service mesh (Istio), and network policies.

Experienced in GitOps using ArgoCD and familiar with FluxCD for configuration-as-code and edge Kubernetes deployments.

Familiar with Apache Flink for real-time data stream processing in observability and log analytics workflows.

Hands-on experience with WebSphere Commerce (WCS) Vertica and Snowflake for monitoring and data warehousing integration.

Strong focus on error budgets, incident response, and blameless post-mortems to improve MTTR, ARM Templates and system reliability.

Deep understanding of SRE principles including fault analysis, SLIs, SLOs, SLAs, and continuous improvement.

Experienced in DevOps practices, CI/CD pipeline management, build and release management, and implementing AWS and Azure solutions.

TECHNICAL SKILLS

Category

Skills/Technologies

Cloud Platforms

Azure (AKS, App Services, Key Vault, Monitor, App Insights, Storage, SQL, VNets, Traffic Manager), AWS (EC2, S3, RDS, EKS, Route53, CloudFormation), GCP (GKE, Cloud SQL, Storage, IAM))

Infrastructure & Services

Azure VMs, Azure App Service, Azure Functions, AKS, Azure Storage (Blob, File), Azure SQL Database, Azure Data Factory, Fault Injection, Error Budgets, Post-Mortem Analysis, Azure Virtual Network; FluxCD (familiar), AWS EC2, ELB, S3, EBS, VPC, Route 53, RDS, ArgoCD, Auto-Scaling, IAM, SNS, SES, SQS, CloudFront, CloudFormation, CloudWatch, Elastic Beanstalk, AWS SageMaker

Identity and Access Management (IAM)

SSO, SAML, OAuth, OIDC, SCIM, Federation, ForgeRock, OpenAM, Okta, Azure AD

Security & Governance

HashiCorp Vault (multi-cloud integration), Azure Key Vault, AWS KMS, GCP Secret Manager, OPA/Gatekeeper, Kyverno, CIS Benchmarks, PodSecurityPolicy, NetworkPolicies

DevOps & Containerization

Jenkins, Terraform, Docker, Ansible, Kubernetes, Git (GitHub, GitLab, Bitbucket), CI/CD for COBOL & TypeScript apps, Circle CI Code Quality Testing (SonarQube), Azure DevOps Pipelines, Bamboo, ArcoCD

Data Extraction & Manipulation

SQL, NoSQL, Nagios, Prometheus, Splunk, MongoDB, PostgreSQL, MySQL

Development Tools & IDEs

PyCharm, IntelliJ, Visual Studio, Sublime, TFS, Linux, Unix, Bash Scripting, PowerShell, JSON, Perl, XML

Operating Systems

Ubuntu, Windows, Linux, UNIX, Windows Server (2008-2016), VMware, vSphere, VirtualBox

Storage

Data Lake Storage, ETL, ADF

Project Management

Agile, Waterfall Methodologies, JIRA, Trello

Observability & Monitoring

Nagios, Splunk, New Relic, AWS ELK, Prometheus, Grafana, CloudWatch, Azure Monitor, Datadog, Dynatrace, Application Insights, Prometheus, Splunk

SCM/Version Control Tools

Git, GitLab, Bitbucket, Apache Flink, Vertica, Snowflake

Artifactory

Nexus, Docker Hub, Amazon ECR

Programming Languages

Python, Shell Scripting, PowerShell, Bash, YAML, Perl, Groovy scripts, Java, Golang

Datastores

RDS, Amazon S3, Snowflake, Vertica,PostgreSQL, MySQL

PROFESSIONAL EXPERIENCE

Client: Microsoft, Bellevue, Washington. Jan 2024 – Present

Role: Sr. SRE/Linux Administrator

Responsibilities:

Designed, implemented, and optimized CI/CD pipelines using Jenkins, GitHub Actions, and Azure DevOps to automate multi-stage deployments across AWS and Azure.

Deployed and managed Kubernetes/OpenShift 4.x clusters using GitOps (ArgoCD/Helm) for scalable microservices.

Automated infrastructure provisioning with Terraform and Ansible, improving deployment efficiency by 60%.

Automated deployments to IBM WebSphere Application Server using Jython and UNIX Shell scripts, reducing manual steps by 70%.

Implemented SRE practices—defined SLIs, SLOs, SLAs, and error budgets to measure and enhance reliability.

Integrated observability tools (Prometheus, Grafana, Splunk, Dynatrace) for proactive monitoring and reduced MTTR.

Configured API gateways, JWT authentication, and rate limiting for secure Node.js/Express microservices.

Deployed and secured workloads on AWS EKS and Azure AKS, optimizing autoscaling and resource utilization.

Implemented container image scanning and secret management using Trivy, Aqua Security, and Azure Key Vault.

Automated backup and disaster recovery pipelines using Veeam and Azure Site Recovery, achieving <5 min RTO.

Led chaos engineering and fault injection testing to validate system resilience and improve reliability metrics.

Built real-time log analytics and monitoring pipelines integrating Apache Flink, Snowflake, and Vertica.

Supported and optimized middleware (WebSphere) and Java microservices for high availability in production.

Delivered incident response, root cause analysis, and blameless post-mortems to drive reliability improvements.

Partnered with cross-functional teams to optimize cloud spend (FinOps) and enhance multi-cloud performance.

Environment: Azure DevOps, Kubernetes, Docker ACS & AKS, Prometheus, Splunk, Terraform, Ansible, Jenkins, Git, Azure Boards, JIRA, Grafana, Python, PowerShell, YAML, Docker, Visual Studio Code, Shell, Nginx, Linux, Windows servers.

Client: Walmart, Bentonville, Arkansas. Jan 2023 to Dec 2023

Role: Sr. DevOps Engineer

Responsibilities:

Automating the build and deployments for different platform specific applications from end-to-end on AWS EC2, EBS, S3, IAM, Route53, Lambda, SNS to reduce manual interventions for application teams in agile environment using DevOps strategies.

Deployed and maintained OpenShift 4.x clusters, integrating LDAP, RBAC, and GitOps workflows using ArgoCD.

Designed and implemented end-to-end CI/CD pipelines using GitLab CI/CD, Jenkins, and ArgoCD for automated builds and deployments on AWS and OpenShift.

Deployed and maintained OpenShift 4.x and AWS EKS clusters, integrating RBAC, LDAP, and Helm charts for microservice deployments.

Automated deployment workflows to IBM WebSphere and OpenShift environments using Jython and Shell scripting.

Automated infrastructure provisioning using Terraform and Ansible, ensuring consistent and repeatable environment builds.

Implemented GitOps workflows with ArgoCD to standardize deployments across dev, QA, and production clusters.

Managed containerized workloads using Docker and Kubernetes, improving release frequency and deployment reliability.

Integrated Prometheus, Grafana, and ELK Stack for observability, real-time monitoring, and alerting of microservices.

Enhanced application security through OPA/Gatekeeper policies, RBAC, and CIS-compliant configurations.

Built backend microservices using Node.js/Express and GraphQL APIs, improving data query performance by 25%.

Configured AWS infrastructure (EC2, S3, RDS, VPC, IAM) and automated scaling using Auto Scaling Groups.

Implemented disaster recovery and backup automation with Velero and AWS Lambda, achieving high availability across clusters.

Developed and maintained Terraform modules for reusable, version-controlled infrastructure deployments.

Reduced manual interventions by 50% by automating build/test/deployment workflows using CI/CD pipelines.

Enforced security and compliance across cloud workloads (AWS/Azure) using Kyverno and policy-as-code principles.

Conducted chaos testing and DR drills to validate system resilience and ensure business continuity.

Collaborated with cross-functional teams to drive cloud migration, monitoring optimization, and release automation initiatives.

Environment: AWS, EC2, AMI, S3, EBS, Elastic Load balancer (ELB), Auto Scaling groups, Glacier, VPC, IAM, Cloud Watch, Akamai, DynamoDB, MySQL, shell scripts, Elasticsearch, Logstash, Kibana, Git, GitHub Maven, Chef, Artifactory, Selenium, Docker, Mesos, Jenkins, PowerShell.

Client: GE Commercial Finance, Chennai, India. Oct 2019 – July 2022

Role: Sr. AWS DevOps Engineer

Responsibilities:

Worked extensively with AWS services like EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS.

Created AWS Cloud Formation templates to create custom sized VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.

Designed and managed AWS infrastructure using EC2, S3, VPC, RDS, CloudFormation, and IAM for scalable and secure deployments.

Automated provisioning and configuration using Terraform, Ansible, and CloudFormation, reducing setup time by 60%.

Built and maintained CI/CD pipelines in Jenkins and Bitbucket, integrating with Maven, Nexus, and SonarQube.

Managed OpenShift and Kubernetes clusters, integrating Prometheus, Grafana, and Splunk for monitoring and alerting.

Modernized COBOL and Java build workflows using Groovy and shell scripting, improving deployment reliability.

Implemented IAM, RBAC, and Key Vault/KMS policies to enhance identity management and compliance.

Deployed and monitored microservices using Docker and Helm charts, ensuring efficient container orchestration.

Configured AWS Workspaces and AppStream 2.0 for remote user access, improving system availability and performance.

Automated ELK stack (Elasticsearch, Logstash, Kibana) setup and maintenance using Ansible for centralized logging.

Integrated New Relic, AppDynamics, and Datadog for proactive system performance monitoring and anomaly detection.

Developed scripts in Python and Shell to automate routine operational tasks and system monitoring.

Ensured application uptime and scalability through load balancing, auto-scaling, and fault-tolerant architectures.

Created disaster recovery runbooks and performed periodic DR drills to ensure RTO/RPO compliance.

Collaborated with development and QA teams to enable continuous testing and automated releases.

Participated in Agile sprints, providing infrastructure updates, release readiness, and incident resolution insights.

Environment: Azure Cloud Services (PaaS & IaaS), Document DB, Azure Monitoring, Key Vault, AKS, ACR, Blob Storage, Cosmos DB, MongoDB, MySQL, Visual Studio Online (VSO), SQL Azure, EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, RDS, Subversion (SVN), Git, Jenkins, Maven, Bitbucket, Nexus, SonarQube.

Client: Magene Life Sciences, Hyderabad, India Jun 2018- Sep 2019

Role: DevOps Engineer

Responsibilities:

Mentored junior team members on best practices for Azure administration, fostering a culture of continuous learning and professional growth.

Implemented CI/CD pipelines using Jenkins to automate build, test, and deployment processes across development and production environments.

Deployed and managed containerized applications using Docker and Kubernetes, improving scalability and deployment speed.

Automated infrastructure provisioning using Terraform and Azure ARM templates, ensuring consistency across environments.

Managed AWS services including EC2, S3, VPC, and Elastic Beanstalk for application hosting and data storage.

Configured Azure Monitor and Veeam backup to ensure data protection, achieving RTO/RPO compliance.

Set up monitoring and alerting systems using Prometheus, Grafana, and ELK Stack to enhance observability.

Performed system patching, configuration management, and security hardening on Linux servers.

Supported middleware components like WebLogic, WebSphere, and Tomcat for application deployment and troubleshooting.

Created PowerShell and Shell scripts to automate routine administrative and deployment tasks.

Managed source code repositories using Git and enforced branching strategies for continuous integration.

Collaborated with development teams to integrate automated testing frameworks in CI/CD pipelines.

Conducted root cause analysis (RCA) and implemented corrective actions to improve reliability and uptime.

Implemented disaster recovery testing with Veeam and Azure Site Recovery to ensure business continuity.

Supported JIRA and ServiceNow workflows for issue tracking, incident management, and project coordination.

Documented deployment processes and best practices to streamline knowledge sharing and reduce operational overhead.

Environment: UNIX, MQ, AWS, Maven, Ant, Jenkins, AWS Cloud, Shell, Java, JIRA, Service now, Apache tomcat, VPC, Elastic Beanstalk, Docker, Nginx, Stratus COBOL, GIT, File System, Forms, Macros, JCL, DB2, Teradata.

EDUCATION

NORTHWEST MISSOURI STATE UNIVERSITY Maryville, MO

Master of Science in Computer Science, GPA: 3.80 AUG 2022 - DEC 2023

VR SIDDHARTHA ENGINEERING COLLEGE Vijayawada, India

Bachelor of Technology, Electronics and Instrumentation Engineering, GPA: 3.4 JULY 2014 - May 2018



Contact this candidate