Reliability Engineer Site

Location:

Irving, TX

Posted:

March 04, 2025

Contact this candidate

Resume:

CONFIDENTIAL & RESTRICTED

Phanindra Babu Naidu

**** ********* ****, ***#***, Irving, TX-75063

Mobile: 512-***-****, Email: **************@*****.*** Professional Summary:

Over 6 years of experience as a Site Reliability Engineer with a focus on scalability and reliability.

Expertise in cloud-based platforms, Continuous Integration, Configuration Management and Monitoring.

Skilled in troubleshooting, managing escalations, and ensuring 24/7 system availability.

Experienced with Agile and Scrum methodologies

Hands-on experience with UNIX and LINUX environments, SQL databases, and cloud platforms such as GCP, Azure & AWS.

Involved in designing, creating, and managing Continuous Build and Integration environments.

Enhanced build and deployment processes through automation using scripting and DevOps tools.

Well-versed in virtualization technologies including OpenShift, Docker & Kubernetes.

Recognized for methodical approach to addressing and fulfilling client and project needs.

Fast learner with self-motivation and ability to work in challenging and dynamic environments.

Technical Skills:

Source Control : Git, Gitbash

Build Automation : Jenkins

Configuration Management : Puppet, Ansible, Terraform

Containers : Dockers, Kubernetes, Openshift

Operating System : Linux, Windows, Mac

Virtualization : Openstack, Azure

Cloud Technology : Azure, GCP, AWS

Programming : Python, Ruby, Shell Scripting

CONFIDENTIAL & RESTRICTED

Work Experience:

Site Reliability Engineer at Amadeus June 2018 – December 2024

Managed multiple Kubernetes clusters across development, staging, and production environments, ensuring adherence to best practices for security and scalability.

Led the transition to microservices architecture using Docker and Kubernetes, facilitating faster development cycles and better resource utilization.

Created and maintained Helm charts to package Kubernetes applications, enabling consistent and repeatable deployments across environments.

Enforced security measures by implementing Role-Based Access Control

(RBAC) and network policies, significantly enhancing the security posture of the Kubernetes clusters.

Developed and optimized CI/CD pipelines using Jenkins and GitLab CI, reducing deployment frequency from bi-weekly to multiple times per day, increasing deployment speed and reducing downtime.

Utilized Terraform for automated provisioning of cloud resources, streamlining infrastructure deployment and reducing setup time through Infrastructure as Code practices.

Automated infrastructure deployment and management on Azure and GCP using Terraform, ensuring consistent and reliable cloud environments.

Engineered solutions on Azure and GCP cloud platforms, leveraging native services for computer, storage, and networking to enhance application performance and scalability.

Implemented security best practices for Azure and GCP environments, including identity and access management (IAM), network security configurations, and resource monitoring.

Integrated cloud-native tools from Azure and GCP into existing workflows, enhancing system observability, cost management, and operational resilience.

Conducted an analysis of cloud resource utilization, leading to reduction in cloud costs through rightsizing and elimination of underused resources.

Implemented infrastructure configuration management using Ansible and Puppet, ensuring consistency, reliability, and ease of deployment across multiple environments.

Developed Ansible playbooks to automate the configuration of servers and deployment of applications, reducing manual intervention and improving deployment speed.

Managed system configurations and automated routine maintenance tasks using Puppet, enhancing system stability and reducing configuration drift. CONFIDENTIAL & RESTRICTED

Designed and implemented a comprehensive monitoring system using Prometheus and Grafana, resulting in reduction in incident response time and improved system reliability.

Built custom monitoring and alerting tools using Python to enhance visibility into system health and performance metrics, enabling proactive incident management.

Developed and maintained automated scripts to generate detailed reports on system performance, incident trends, and SLA compliance, allowing leadership to make data-driven decisions on system improvements and optimizations.

Wrote Ruby scripts to automate repetitive administrative tasks, streamline deployment processes, and improve overall workflow efficiency.

Created new farms in different environments of Amadeus internal systems and configured integration between those new farms and third-party platforms like Akamai.

Monitored and optimized performance, ensuring compliance with stringent SLAs and providing rapid issue resolution to maintain system uptime, particularly during high-stakes events.

Partnered with travel agencies to troubleshoot and resolve complex issues in flight reservations, ensuring minimal disruption to services and high customer satisfaction.

Analyzed logs, fine-tuned queries, and adjusted system parameters to optimize application and database performance, reducing response and improving overall system efficiency.

Participated in change management processes, reviewing and approving modifications to production systems to ensure compliance with company standards and minimal impact on performance or uptime.

Worked on capacity planning for systems and applications, ensuring that sufficient resources were allocated to handle traffic spikes and future growth without compromising system performance.

Led initiatives to enhance production support processes by reviewing incident logs, identifying recurring issues, and recommending architectural improvements to prevent future occurrences and improve deployment practices.

Managed regular backups and recovery plans for production databases and application servers, ensuring data integrity and availability in case of outages or system failures.

Mentored junior engineers on troubleshooting techniques, system monitoring, and production best practices, contributing to overall team development and improving response times for production issues. CONFIDENTIAL & RESTRICTED

Provided 24/7 on-call support as part of a rotating schedule, ensuring timely resolution of critical production incidents and maintaining strong system stability, even during off-hours.

Certifications:

● Microsoft Certified: Azure DevOps Engineer Expert

● Microsoft Certified: Azure Administrator Associate

● Microsoft Certified: Azure Fundamentals

Educational Qualifications:

BACHELORS COMPUTER SCIENCE & ENGINEERING APRIL 2015 Jawaharlal Nehru Technological University Kakinada, India MASTER OF INFORMATION TECHNOLOGY DECEMBER 2017

University of Mary Hardin Baylor Belton, Texas

Specialization: Information Systems

Contact this candidate