Cloud Engineer Reliability

Location:

San Jose, CA

Posted:

April 14, 2025

Contact this candidate

Resume:

Chandrakanth

San Jose, California

408-***-****, *********@*****.***

Professional Summary:

Accomplished IT professional with 18 years of overall experience, including a focused role as a Cloud Engineer. Expertise in managing and optimizing infrastructure and technology to drive operational efficiency. Passionate about automation, leveraging a variety of tools to enhance processes. Proficient in application deployment, developing effective monitoring solutions, and delivering high-quality application support. Technical Proficiencies

DevOps Tools Ansible, GIT, Docker, Kubernetes, GitHub Actions CI /CD Spinnaker, Jenkins, Azure DevOps

Cloud Technologies Azure, AWS, Google Cloud, Knowledge of OCI Monitoring Tools Grafana, Kibana, Mosaic, Epic, nmsys, Prometheus, Nagios Operating Systems Linux, Unix, Windows

Change Management JIRA, SNOW, Confluence, ITIL, Zendesk Scripting Bash, Python

Tools & Frameworks Grafana, Splunk, jFrog, radar, stark, ELK/Elastic Stack, Kibana Databases SQL, MongoDb

Professional Experience

Wipro Ltd (November 2021 to present)

Client: Apple Inc.

Site Reliability Engineer, San Jose, California (November 2021 to March 5th, 2025) Responsibilities:

As a DevOps SRE at Apple, I managed and developed CI/CD pipelines for multiple applications, optimizing Ansible playbooks, and troubleshooting deployment issues. I also manage and coordinate both offshore and onsite teams to ensure smooth collaboration and successful hsm application update implementation across various applications.

• Manage CI/CD spinnaker pipeline to automate, build, test, and deploy activities for the hsm module.

• Develop, enhance, test, and maintain Ansible playbooks for the hsm module

• Supporting and deploying hsm changes using the Spinnaker for Infrastructure / Operational changes.

• Develop bash and Python scripts for routine activities.

• Develop observability dashboards and enhance current monitoring dashboards using Grafana.

• Supported troubleshooting deployment-related issues for production and non-production environments with application developers.

• Building, configuring, and setting up virtual machines for production and non-production environments.

Chandrakanth

• Closely collaborate with development and testing team, manage and configure the necessary infrastructure services to support new hsm feature development.

• Create runbooks and document processes and procedures for a first-level support team for regular operational activities.

• Provide guidance and mentorship to support team members on hsm feature update and deployment activities.

• Work with third-party vendors and onboard hsm applications from scratch, which involves developing, testing, and automation followed by infrastructure provisioning.

• Coordinate and work with onsite/offshore coordination for developments and complete the Design, Build, Test, and Deploy phase.

A Society (April 2021 – November 2021)

MediaKind, Santa Clara, California

Site Reliability Engineer, (April 2021 – November 2021) Managed and lead onsite and offshore team and oversaw support of the application product hosted in the Azure cloud as the cloud administrator. Aided with automation and improved processes by collaborating with development and testing team to develop scripts through bash and python for set-up box tests. Enhanced monitoring using Grafana and Prometheus tools. Improved log extraction and data analysis by scripting existing internal tools. On-call support post-off hours and worked with customers post outages to develop improvement plans. Follow-up with customers while also collaborating closely with field engineers. Reported monthly and weekly updates to clients completed delivery assurance reviews for pipeline releases and handled outages quickly.

ITC Infotech (USA), Inc. (Nov 2011 – Feb 2021)

Ericsson, Santa Clara, California

Senior Technical Support Engineer, (February 2020 – Feb 2021) Used Prometheus to review and fine-tune the monitoring and configuration of the OpsGenie alarms for the critical application. Manage continual integration and delivery of applications to automate application configuration management using salt. Deployment of application every release cycle and managing MOP. Aided in the analysis of network traces using Bastion. Log analysis using Google Stackdrivers and setting up alarms for application monitoring. Optimize and troubleshoot system performance. Utilize Zendesk tools to create a team dashboard and Jira tools to track defects and changes. Knowledge in CI/CD (Continuous Integration /Continuous Delivery) framework using Git, Maven Docker, Kubernetes, and Jenkins for automated build & deployment. Ensured 24x7 service by providing after-hours support on a rotating basis. MediaKind, Santa Clara, California

Site Reliability Engineer, (April 2018 – February 2020) Oversaw support of the application product hosted in the Azure cloud as the cloud administrator, including customer environment maintenance and configuration of virtual machines, storage accounts, resource groups, and access management. Aided development teams with daily activities in the cloud environment. Used Prometheus and Nagios for server monitoring, Grafana and Kibana tools for log extraction and data analysis. Provided incident and bug management, technical help, diagnostics, and follow-up with customers while also collaborating closely with field engineers. Reported monthly and weekly updates to clients, completed deliver assurance reviews for pipeline releases, and handled outages quickly. Aided with automation and improved processes by collaborating with the team to develop scripts through bash and python. Ensured 24x7 service by providing after-hours support on a rotating basis alongside mentoring both on-site and offshore team members. Chandrakanth

Ericsson, Santa Clara, California

Site Reliability Operator, (October 2014 – April 2018) Directed Media First Service Requests including user provisioning, client invites, environmental requests, and deployments among many other client requirements. Facilitated high-priority bridge moderation, live service monitoring, heads-up displays, manual service checks, and customer escalations to proper resources within adherence to the service-level-agreement. Refined and improved internal tools and processes to aid in-service availability and performance.

Microsoft Corporation, Mountain View, California

Operation Engineer, (April 2013 – October 2014)

Utilized various tools, including Nagios, Cacti, Check, and SCOM, to monitor critical online services. Managed customer incidents from the initial call to resolution, ensuring high reliability and performance for servers, network devices, and applications. Applied crisis management skills to coordinate complex incidents and outages, document troubleshooting guides and recovery procedures for future service improvements. Oversaw and tracked change requests in the operations center while also suggesting enhancements for tools and automation.

Education

Bachelor, Electronics & Telecommunications

Sri Siddhartha Institute of Technology, Tumkur, India

Contact this candidate