NAME: Rahul Bitra (Sr SRE DevOps Engineer)
Phone: 206-***-****
Email: *************@*****.***
PROFESSIONAL SUMMARY
Results-driven Senior SRE/DevOps Engineer with 7+ years of experience designing, automating, and maintaining scalable infrastructure across AWS, Azure, and GCP. Skilled in Kubernetes, OpenShift 4.x, Terraform, Ansible, and Jenkins to build and deploy resilient CI/CD pipelines. Experienced in infrastructure-as-code, GitOps (ArgoCD/FluxCD), and implementing SRE best practices including SLIs, SLOs, SLAs, and error budgets. Proficient in integrating observability and security tools such as Prometheus, Grafana, Dynatrace, and Splunk to enhance system reliability and performance. Strong focus on automation, cloud migration, and operational excellence through continuous improvement.
Proficient in OpenShift 4.x and Kubernetes for orchestrating scalable, resilient container environments in both on-prem and cloud (AWS/Azure) MuleSoft,s.
Skilled in infrastructure automation using Ansible, Terraform, ARM templates and shell/Python scripting to deploy, configure, and monitor Linux-based environments.
Proficient in developing and maintaining CI/CD pipelines across mixed technology stacks including ARM Templates, COBOL, Java, and TypeScript applications..
Hands-on in integrating monitoring tools like Dynatrace, Prometheus, and Splunk to enhance observability, reduce MTTR, and improve SRE metrics (SLIs/SLOs/SLAs).
Experience managing production workloads on Amazon EKS, implementing autoscaling, service mesh (Istio), and network policies.
Experienced in GitOps using ArgoCD and familiar with FluxCD for configuration-as-code and edge Kubernetes deployments.
Familiar with Apache Flink for real-time data stream processing in observability and log analytics workflows.
Hands-on experience with WebSphere Commerce (WCS) Vertica and Snowflake for monitoring and data warehousing integration.
Strong focus on error budgets, incident response, and blameless post-mortems to improve MTTR, ARM Templates and system reliability.
Deep understanding of SRE principles including fault analysis, SLIs, SLOs, SLAs, and continuous improvement.
Experienced in DevOps practices, CI/CD pipeline management, build and release management, and implementing AWS and Azure solutions.
TECHNICAL SKILLS
Category
Skills/Technologies
Cloud Platforms
Azure (AKS, App Services, Key Vault, Monitor, App Insights, Storage, SQL, VNets, Traffic Manager), AWS (EC2, S3, RDS, EKS, Route53, CloudFormation), GCP (GKE, Cloud SQL, Storage, IAM))
Infrastructure & Services
Azure VMs, Azure App Service, Azure Functions, AKS, Azure Storage (Blob, File), Azure SQL Database, Azure Data Factory, Fault Injection, Error Budgets, Post-Mortem Analysis, Azure Virtual Network; FluxCD (familiar), AWS EC2, ELB, S3, EBS, VPC, Route 53, RDS, ArgoCD, Auto-Scaling, IAM, SNS, SES, SQS, CloudFront, CloudFormation, CloudWatch, Elastic Beanstalk, AWS SageMaker
Identity and Access Management (IAM)
SSO, SAML, OAuth, OIDC, SCIM, Federation, ForgeRock, OpenAM, Okta, Azure AD
Security & Governance
HashiCorp Vault (multi-cloud integration), Azure Key Vault, AWS KMS, GCP Secret Manager, OPA/Gatekeeper, Kyverno, CIS Benchmarks, PodSecurityPolicy, NetworkPolicies
DevOps & Containerization
Jenkins, Terraform, Docker, Ansible, Kubernetes, Git (GitHub, GitLab, Bitbucket), CI/CD for COBOL & TypeScript apps, Circle CI Code Quality Testing (SonarQube), Azure DevOps Pipelines, Bamboo, ArcoCD
Data Extraction & Manipulation
SQL, NoSQL, Nagios, Prometheus, Splunk, MongoDB, PostgreSQL, MySQL
Development Tools & IDEs
PyCharm, IntelliJ, Visual Studio, Sublime, TFS, Linux, Unix, Bash Scripting, PowerShell, JSON, Perl, XML
Operating Systems
Ubuntu, Windows, Linux, UNIX, Windows Server (2008-2016), VMware, vSphere, VirtualBox
Storage
Data Lake Storage, ETL, ADF
Project Management
Agile, Waterfall Methodologies, JIRA, Trello
Observability & Monitoring
Nagios, Splunk, New Relic, AWS ELK, Prometheus, Grafana, CloudWatch, Azure Monitor, Datadog, Dynatrace, Application Insights, Prometheus, Splunk
SCM/Version Control Tools
Git, GitLab, Bitbucket, Apache Flink, Vertica, Snowflake
Artifactory
Nexus, Docker Hub, Amazon ECR
Programming Languages
Python, Shell Scripting, PowerShell, Bash, YAML, Perl, Groovy scripts, Java, Golang
Datastores
RDS, Amazon S3, Snowflake, Vertica,PostgreSQL, MySQL
PROFESSIONAL EXPERIENCE
Client: Microsoft, Bellevue, Washington. Jan 2024 – Present
Role: Sr. SRE/Linux Administrator
Responsibilities:
Designed, implemented, and optimized CI/CD pipelines using Jenkins, GitHub Actions, and Azure DevOps to automate multi-stage deployments across AWS and Azure.
Deployed and managed Kubernetes/OpenShift 4.x clusters using GitOps (ArgoCD/Helm) for scalable microservices.
Automated infrastructure provisioning with Terraform and Ansible, improving deployment efficiency by 60%.
Automated deployments to IBM WebSphere Application Server using Jython and UNIX Shell scripts, reducing manual steps by 70%.
Implemented SRE practices—defined SLIs, SLOs, SLAs, and error budgets to measure and enhance reliability.
Integrated observability tools (Prometheus, Grafana, Splunk, Dynatrace) for proactive monitoring and reduced MTTR.
Configured API gateways, JWT authentication, and rate limiting for secure Node.js/Express microservices.
Deployed and secured workloads on AWS EKS and Azure AKS, optimizing autoscaling and resource utilization.
Implemented container image scanning and secret management using Trivy, Aqua Security, and Azure Key Vault.
Automated backup and disaster recovery pipelines using Veeam and Azure Site Recovery, achieving <5 min RTO.
Led chaos engineering and fault injection testing to validate system resilience and improve reliability metrics.
Built real-time log analytics and monitoring pipelines integrating Apache Flink, Snowflake, and Vertica.
Supported and optimized middleware (WebSphere) and Java microservices for high availability in production.
Delivered incident response, root cause analysis, and blameless post-mortems to drive reliability improvements.
Partnered with cross-functional teams to optimize cloud spend (FinOps) and enhance multi-cloud performance.
Environment: Azure DevOps, Kubernetes, Docker ACS & AKS, Prometheus, Splunk, Terraform, Ansible, Jenkins, Git, Azure Boards, JIRA, Grafana, Python, PowerShell, YAML, Docker, Visual Studio Code, Shell, Nginx, Linux, Windows servers.
Client: Walmart, Bentonville, Arkansas. Jan 2023 to Dec 2023
Role: Sr. DevOps Engineer
Responsibilities:
Automating the build and deployments for different platform specific applications from end-to-end on AWS EC2, EBS, S3, IAM, Route53, Lambda, SNS to reduce manual interventions for application teams in agile environment using DevOps strategies.
Deployed and maintained OpenShift 4.x clusters, integrating LDAP, RBAC, and GitOps workflows using ArgoCD.
Designed and implemented end-to-end CI/CD pipelines using GitLab CI/CD, Jenkins, and ArgoCD for automated builds and deployments on AWS and OpenShift.
Deployed and maintained OpenShift 4.x and AWS EKS clusters, integrating RBAC, LDAP, and Helm charts for microservice deployments.
Automated deployment workflows to IBM WebSphere and OpenShift environments using Jython and Shell scripting.
Automated infrastructure provisioning using Terraform and Ansible, ensuring consistent and repeatable environment builds.
Implemented GitOps workflows with ArgoCD to standardize deployments across dev, QA, and production clusters.
Managed containerized workloads using Docker and Kubernetes, improving release frequency and deployment reliability.
Integrated Prometheus, Grafana, and ELK Stack for observability, real-time monitoring, and alerting of microservices.
Enhanced application security through OPA/Gatekeeper policies, RBAC, and CIS-compliant configurations.
Built backend microservices using Node.js/Express and GraphQL APIs, improving data query performance by 25%.
Configured AWS infrastructure (EC2, S3, RDS, VPC, IAM) and automated scaling using Auto Scaling Groups.
Implemented disaster recovery and backup automation with Velero and AWS Lambda, achieving high availability across clusters.
Developed and maintained Terraform modules for reusable, version-controlled infrastructure deployments.
Reduced manual interventions by 50% by automating build/test/deployment workflows using CI/CD pipelines.
Enforced security and compliance across cloud workloads (AWS/Azure) using Kyverno and policy-as-code principles.
Conducted chaos testing and DR drills to validate system resilience and ensure business continuity.
Collaborated with cross-functional teams to drive cloud migration, monitoring optimization, and release automation initiatives.
Environment: AWS, EC2, AMI, S3, EBS, Elastic Load balancer (ELB), Auto Scaling groups, Glacier, VPC, IAM, Cloud Watch, Akamai, DynamoDB, MySQL, shell scripts, Elasticsearch, Logstash, Kibana, Git, GitHub Maven, Chef, Artifactory, Selenium, Docker, Mesos, Jenkins, PowerShell.
Client: GE Commercial Finance, Chennai, India. Oct 2019 – July 2022
Role: Sr. AWS DevOps Engineer
Responsibilities:
Worked extensively with AWS services like EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS.
Created AWS Cloud Formation templates to create custom sized VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
Designed and managed AWS infrastructure using EC2, S3, VPC, RDS, CloudFormation, and IAM for scalable and secure deployments.
Automated provisioning and configuration using Terraform, Ansible, and CloudFormation, reducing setup time by 60%.
Built and maintained CI/CD pipelines in Jenkins and Bitbucket, integrating with Maven, Nexus, and SonarQube.
Managed OpenShift and Kubernetes clusters, integrating Prometheus, Grafana, and Splunk for monitoring and alerting.
Modernized COBOL and Java build workflows using Groovy and shell scripting, improving deployment reliability.
Implemented IAM, RBAC, and Key Vault/KMS policies to enhance identity management and compliance.
Deployed and monitored microservices using Docker and Helm charts, ensuring efficient container orchestration.
Configured AWS Workspaces and AppStream 2.0 for remote user access, improving system availability and performance.
Automated ELK stack (Elasticsearch, Logstash, Kibana) setup and maintenance using Ansible for centralized logging.
Integrated New Relic, AppDynamics, and Datadog for proactive system performance monitoring and anomaly detection.
Developed scripts in Python and Shell to automate routine operational tasks and system monitoring.
Ensured application uptime and scalability through load balancing, auto-scaling, and fault-tolerant architectures.
Created disaster recovery runbooks and performed periodic DR drills to ensure RTO/RPO compliance.
Collaborated with development and QA teams to enable continuous testing and automated releases.
Participated in Agile sprints, providing infrastructure updates, release readiness, and incident resolution insights.
Environment: Azure Cloud Services (PaaS & IaaS), Document DB, Azure Monitoring, Key Vault, AKS, ACR, Blob Storage, Cosmos DB, MongoDB, MySQL, Visual Studio Online (VSO), SQL Azure, EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, RDS, Subversion (SVN), Git, Jenkins, Maven, Bitbucket, Nexus, SonarQube.
Client: Magene Life Sciences, Hyderabad, India Jun 2018- Sep 2019
Role: DevOps Engineer
Responsibilities:
Mentored junior team members on best practices for Azure administration, fostering a culture of continuous learning and professional growth.
Implemented CI/CD pipelines using Jenkins to automate build, test, and deployment processes across development and production environments.
Deployed and managed containerized applications using Docker and Kubernetes, improving scalability and deployment speed.
Automated infrastructure provisioning using Terraform and Azure ARM templates, ensuring consistency across environments.
Managed AWS services including EC2, S3, VPC, and Elastic Beanstalk for application hosting and data storage.
Configured Azure Monitor and Veeam backup to ensure data protection, achieving RTO/RPO compliance.
Set up monitoring and alerting systems using Prometheus, Grafana, and ELK Stack to enhance observability.
Performed system patching, configuration management, and security hardening on Linux servers.
Supported middleware components like WebLogic, WebSphere, and Tomcat for application deployment and troubleshooting.
Created PowerShell and Shell scripts to automate routine administrative and deployment tasks.
Managed source code repositories using Git and enforced branching strategies for continuous integration.
Collaborated with development teams to integrate automated testing frameworks in CI/CD pipelines.
Conducted root cause analysis (RCA) and implemented corrective actions to improve reliability and uptime.
Implemented disaster recovery testing with Veeam and Azure Site Recovery to ensure business continuity.
Supported JIRA and ServiceNow workflows for issue tracking, incident management, and project coordination.
Documented deployment processes and best practices to streamline knowledge sharing and reduce operational overhead.
Environment: UNIX, MQ, AWS, Maven, Ant, Jenkins, AWS Cloud, Shell, Java, JIRA, Service now, Apache tomcat, VPC, Elastic Beanstalk, Docker, Nginx, Stratus COBOL, GIT, File System, Forms, Macros, JCL, DB2, Teradata.
EDUCATION
NORTHWEST MISSOURI STATE UNIVERSITY Maryville, MO
Master of Science in Computer Science, GPA: 3.80 AUG 2022 - DEC 2023
VR SIDDHARTHA ENGINEERING COLLEGE Vijayawada, India
Bachelor of Technology, Electronics and Instrumentation Engineering, GPA: 3.4 JULY 2014 - May 2018