Umesh Raj Bhatt
****************@*****.*** +1-276-***-**** LinkedIn
Principal Site Reliability Engineer
Professional Summary:
Highly skilled Site Reliability Engineer with over 15 years of experience in cloud infrastructure, automation, and distributed systems design. Proficient in OCI, AWS, Docker, Kubernetes, and Terraform, with a strong focus on Infrastructure as Code (IaC), CI/CD pipelines, and monitoring & alerting systems. Expertise in ensuring high availability, performance optimization, and security compliance (PCI, SOC2, FIPS). Adept at leading crossfunctional teams to design and implement scalable solutions, reduce downtime, and enhance operational efficiency. Demonstrated success in automating workflows, managing incidents, and ensuring security compliance for largescale cloud environments.
Technical Skills:
Category
Skills/Tools
Cloud Platforms
AWS, Oracle Cloud Infrastructure (OCI), Azure
Containerization & Orchestration
Docker, Kubernetes
Automation & IaC
Terraform, Ansible, Chef, Puppet
CI/CD Tools
Jenkins, GitHub, Bitbucket, Shepherd
Monitoring & Observability
Prometheus, Grafana, ELK Stack, Datadog, Nagios
Security & Compliance
PCI DSS, SOC2, NIST 800-171, IAM, System Hardening, Security Auditing
Programming & Scripting
Python, Ruby, Bash, Shell scripting
Incident Management & Troubleshooting
Root Cause Analysis, Incident Response, System Debugging
Version Control & Collaboration
Git, Bitbucket, GitHub, Bitbucket Pipelines
Work Experience:
Oracle
Principal Site Reliability Engineer
January 2022 - Present
Spearheaded the architecture, deployment, and management of critical cloud infrastructure and services in OCI and AWS, driving improvements in scalability, security, and availability.
Automated cloud resource provisioning using Terraform and Ansible, reducing manual configuration efforts by over 70% and increasing system efficiency.
Implemented and managed CI/CD pipelines using Jenkins, GitHub, and Bitbucket, optimizing deployment cycles and enhancing collaboration between development and operations teams.
Designed and deployed comprehensive monitoring and alerting systems with Prometheus, Grafana, and ELK stack, improving system health visibility and reducing downtime by 30%.
Led security compliance efforts for PCI, SOC2, and FIPS controls, and system hardening.
Conducted in-depth performance analysis and capacity planning, proactively identifying and mitigating bottlenecks, ensuring optimal resource utilization.
Oracle
Principal Systems Engineer
October 2014 - December 2022
Led the cloud migration strategy to Oracle Cloud Infrastructure (OCI), overseeing the transition of 40,000+ virtual machines and services, optimizing performance, cost, and scalability.
Pioneered the use of Infrastructure as Code (IaC) to automate infrastructure provisioning and management, streamlining operations and reducing deployment time by 60%.
Developed and maintained monitoring and alerting systems using ELK stack, Nagios, and Prometheus, ensuring proactive issue resolution and minimizing system downtime.
Championed the implementation of zero-downtime patching using Ksplice for Linux systems, reducing patching time by 90% and mitigating production disruptions.
Delivered comprehensive security auditing and log management solutions for compliance with PCI, SOC2, and FIPS, ensuring audit readiness and security best practices.
Micros Systems, Inc.
Senior Systems Engineer
August 2009 - September 2014
Led the design, deployment, and maintenance of high-availability Linux and Windows server systems, utilizing Puppet for configuration management and reducing build times by 95%.
Architected and managed the deployment of a CyberArk password management system, securing 5,000+ Linux and Windows systems and enhancing compliance with security standards.
Implemented automated Linux OS patching for critical production systems, significantly reducing manual intervention and ensuring system security and compliance.
Conducted capacity planning for data center operations, optimizing resource allocation and ensuring performance scalability under heavy workloads.
Certifications:
Certified Kubernetes Administrator (CKA)
AWS Certified Solutions Architect – Associate
Oracle Cloud Infrastructure 2020 Architect Professional
VMware Certified Professional – Data Center Virtualization
Oracle Cloud Infrastructure 2023 AI Foundations Associate
Red Hat Certified Engineer (RHCE)
CCNA, ITIL Foundation Certificate in IT Service Management
DevOps Fundamentals, Terraform, Ansible
Additional Information:
DOD IL5 US Person Verification clearance
Proficient in designing and implementing containerized solutions using Docker and Kubernetes for seamless application deployment and orchestration in cloud environments
Strong communicator and team player, with a proven ability to collaborate effectively with cross functional
teams to deliver solutions that enhance system reliability, security, and performance
Previous Experience:
System Administrator – Nepal Bank Limited (Jan 2006 – Jan 2009)
IT Consultant – Sagarmatha Chaudhary Eye Hospital (Jan 2007 – Oct 2008)
Education:
Maharishi University of Management
Fairfield, Iowa, USA
Master of Science in Computer Science
Graduation: May, 2012
Bachelor of Computer Engineering
Pokhara, Nepal
Pokhara University, Pokhara Nepal
Graduation: Oct, 2008