Site Reliability Cloud Platform

Location:

United States

Posted:

January 24, 2025

Contact this candidate

Resume:

Utkarsh Gandhi

www.linkedin.com/in/utkarshpgandhi

****************@*****.*** 910-***-****

Summary:

With 11 years of strong experience in Site Reliability Engineering (SRE), DevOps, AWS, and Build/Release Engineering, expertise has been developed in Software Configuration Management (SCM), Build/Release Management, Continuous Integration, and Continuous Delivery using a wide range of tools.

Skilled in configuring and deploying infrastructure and applications in the cloud using AWS services such as EC2, S3, RDS, EBS, VPC, SNS, IAM, Route 53, Auto Scaling, CloudFront, CloudWatch, CloudTrail, CloudFormation, OpsWorks, and Security Groups, with a focus on fault tolerance and high availability.

Strong understanding of SCM processes, including compiling, packaging, and deploying applications.

Proficient in Continuous Integration and Deployment methodologies using Jenkins, SonarQube, and GitLab.

Use Infrastructure as Code (IaC) and CI/CD pipelines to automate deployment processes using Google Cloud Platform.

Skilled in troubleshooting production issues related to CPU resource utilization, application performance, and code logic.

Solid knowledge of Object-Oriented Design and Programming concepts in Java.

Experienced in scripting with Shell, Python, C, Bourne, and Perl for maintaining and developing scripts, as well as troubleshooting.

Proficient in using build automation tools like Jenkins and Maven to implement end-to-end automation and working experience with Dynatrace

Hands-on experience with tools such as POSTMAN and SOAP in order to test the web-service.

Utilized AWS CloudWatch to monitor environments for operational and performance metrics during load testing.

Extensively worked with Docker for virtualization, deploying and securing applications for streamlined Build/Release Engineering processes.

Experienced with.

Skilled in creating Docker containers from scratch and leveraging Linux Containers and AMIs, along with Dockerfiles.

Managed Docker containers with Kubernetes, automating container maintenance and working with REST APIs.

Utilized Terraform for managing AWS Infrastructure as Code (IaC).

Designed scalable and reliable systems on the Google Cloud Platform which includes providing efficiency such as Compute Engine, App Engine and Kubernetes.

Integrated machine learning models into production environments using CI/CD pipelines, leveraging AWS services, Kubernetes, and Docker for automated deployment and monitoring.

Collaborated with data scientists and developers to streamline model versioning, testing, deployment, and monitoring, ensuring the smooth transition of models from development to production.

Leveraged AI-driven monitoring tools, such as Splunk and AWS CloudWatch, to automate incident detection, root cause analysis, and performance optimization, enhancing system reliability and operational efficiency.

Actively mentored junior engineers, providing guidance on best practices for DevOps, AWS infrastructure, and Build/Release Engineering and Linux environment.

Conducted training sessions and code reviews, fostering the professional growth of team members and improving the overall effectiveness of the engineering team.

Created dashboards for log analysis and visualization using Prometheus and Grafana.

Leveraged monitoring tools such as CloudWatch and Splunk for log analysis, performance monitoring, and dashboard creation during production.

Provided 24x7 production support, including on-call and weekend shifts.

Experienced in troubleshooting, backup, and recovery processes.

Skills: