Post Job Free
Sign in

Principal Site Reliability Engineer

Location:
Ashburn, VA
Posted:
April 30, 2025

Contact this candidate

Resume:

Umesh Raj Bhatt

****************@*****.*** +1-276-***-**** LinkedIn

Principal Site Reliability Engineer

Professional Summary:

Highly skilled Site Reliability Engineer with over 15 years of experience in cloud infrastructure, automation, and distributed systems design. Proficient in OCI, AWS, Docker, Kubernetes, and Terraform, with a strong focus on Infrastructure as Code (IaC), CI/CD pipelines, and monitoring & alerting systems. Expertise in ensuring high availability, performance optimization, and security compliance (PCI, SOC2, FIPS). Adept at leading crossfunctional teams to design and implement scalable solutions, reduce downtime, and enhance operational efficiency. Demonstrated success in automating workflows, managing incidents, and ensuring security compliance for largescale cloud environments.

Technical Skills:

Category

Skills/Tools

Cloud Platforms

AWS, Oracle Cloud Infrastructure (OCI), Azure

Containerization & Orchestration

Docker, Kubernetes

Automation & IaC

Terraform, Ansible, Chef, Puppet

CI/CD Tools

Jenkins, GitHub, Bitbucket, Shepherd

Monitoring & Observability

Prometheus, Grafana, ELK Stack, Datadog, Nagios

Security & Compliance

PCI DSS, SOC2, NIST 800-171, IAM, System Hardening, Security Auditing

Programming & Scripting

Python, Ruby, Bash, Shell scripting

Incident Management & Troubleshooting

Root Cause Analysis, Incident Response, System Debugging

Version Control & Collaboration

Git, Bitbucket, GitHub, Bitbucket Pipelines

Work Experience:

Oracle

Principal Site Reliability Engineer

January 2022 - Present

Spearheaded the architecture, deployment, and management of critical cloud infrastructure and services in OCI and AWS, driving improvements in scalability, security, and availability.

Automated cloud resource provisioning using Terraform and Ansible, reducing manual configuration efforts by over 70% and increasing system efficiency.

Implemented and managed CI/CD pipelines using Jenkins, GitHub, and Bitbucket, optimizing deployment cycles and enhancing collaboration between development and operations teams.

Designed and deployed comprehensive monitoring and alerting systems with Prometheus, Grafana, and ELK stack, improving system health visibility and reducing downtime by 30%.

Led security compliance efforts for PCI, SOC2, and FIPS controls, and system hardening.

Conducted in-depth performance analysis and capacity planning, proactively identifying and mitigating bottlenecks, ensuring optimal resource utilization.

Oracle

Principal Systems Engineer

October 2014 - December 2022

Led the cloud migration strategy to Oracle Cloud Infrastructure (OCI), overseeing the transition of 40,000+ virtual machines and services, optimizing performance, cost, and scalability.

Pioneered the use of Infrastructure as Code (IaC) to automate infrastructure provisioning and management, streamlining operations and reducing deployment time by 60%.

Developed and maintained monitoring and alerting systems using ELK stack, Nagios, and Prometheus, ensuring proactive issue resolution and minimizing system downtime.

Championed the implementation of zero-downtime patching using Ksplice for Linux systems, reducing patching time by 90% and mitigating production disruptions.

Delivered comprehensive security auditing and log management solutions for compliance with PCI, SOC2, and FIPS, ensuring audit readiness and security best practices.

Micros Systems, Inc.

Senior Systems Engineer

August 2009 - September 2014

Led the design, deployment, and maintenance of high-availability Linux and Windows server systems, utilizing Puppet for configuration management and reducing build times by 95%.

Architected and managed the deployment of a CyberArk password management system, securing 5,000+ Linux and Windows systems and enhancing compliance with security standards.

Implemented automated Linux OS patching for critical production systems, significantly reducing manual intervention and ensuring system security and compliance.

Conducted capacity planning for data center operations, optimizing resource allocation and ensuring performance scalability under heavy workloads.

Certifications:

Certified Kubernetes Administrator (CKA)

AWS Certified Solutions Architect – Associate

Oracle Cloud Infrastructure 2020 Architect Professional

VMware Certified Professional – Data Center Virtualization

Oracle Cloud Infrastructure 2023 AI Foundations Associate

Red Hat Certified Engineer (RHCE)

CCNA, ITIL Foundation Certificate in IT Service Management

DevOps Fundamentals, Terraform, Ansible

Additional Information:

DOD IL5 US Person Verification clearance

Proficient in designing and implementing containerized solutions using Docker and Kubernetes for seamless application deployment and orchestration in cloud environments

Strong communicator and team player, with a proven ability to collaborate effectively with cross functional

teams to deliver solutions that enhance system reliability, security, and performance

Previous Experience:

System Administrator – Nepal Bank Limited (Jan 2006 – Jan 2009)

IT Consultant – Sagarmatha Chaudhary Eye Hospital (Jan 2007 – Oct 2008)

Education:

Maharishi University of Management

Fairfield, Iowa, USA

Master of Science in Computer Science

Graduation: May, 2012

Bachelor of Computer Engineering

Pokhara, Nepal

Pokhara University, Pokhara Nepal

Graduation: Oct, 2008



Contact this candidate