Post Job Free
Sign in

Devops Engineer Reliability

Location:
Hutto, TX
Posted:
May 21, 2025

Contact this candidate

Resume:

Venkata Vamsi Krishna Goriparthi

*********@*****.*** 682-***-**** LinkedIn: https://www.linkedin.com/in/goriparthi8620

Professional Summary

Highly skilled DevOps Engineer and Site Reliability Engineer (SRE) with over 10 years of experience designing, deploying, and managing cloud-native solutions on AWS, Kubernetes, and distributed systems. Proven expertise in automating infrastructure, optimizing CI/CD pipelines, and ensuring high availability, scalability, and reliability of mission-critical applications. Adept at implementing Infrastructure as Code (IaC), containerization, and monitoring tools to streamline operations and enhance system performance. Certified AWS Solutions Architect with deep knowledge of microservices, chaos engineering, and disaster recovery strategies. Passionate about leveraging cutting-edge technologies to empower developers and deliver scalable, secure, and efficient IT solutions.

Technical Skills

Cloud Platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure

Containerization & Orchestration: Docker, Kubernetes (EKS, TKGI, Tanzu), ECS

Infrastructure as Code (IaC): Terraform, CloudFormation

CI/CD Tools: GitLab CI/CD, Jenkins, AWS CodeBuild, AWS CodePipeline

Monitoring & Observability: Prometheus, Grafana, CloudWatch, Splunk, ELK Stack, Dynatrace, DataDog, OpenSearch

Programming & Scripting: Python, Bash, JavaScript, Go, Ruby, Groovy

Networking: VPCs, Security Groups, VPNs, Firewalls, TCP/IP, DNS, DHCP, SMTP, SFTP

Disaster Recovery: Chaos Monkey, AWS FIS, BCP, Techline, AWS Backups, AWS ARC

Databases: Aurora PostgreSQL, MongoDB, MySQL, Redis Enterprise

ITSM Integration: ServiceNow, PagerDuty

Other Tools: Concourse, Vault, Ansible, AppDynamics, Wavefront

Professional Experience

Senior Cloud Operations Specialist

Wells Fargo (Contract) Oct 2024 – Jan 2025

PCF Platform Admin and Ops:

Spearheaded the deployment of Pivotal Cloud Foundry (PCF) on a new data center, configuring Ops Manager and deploying foundational tiles.

Automated routine platform maintenance using Concourse pipelines, reducing manual intervention by 40%.

Conducted capacity planning and optimized resource allocation across multiple PCF foundations, ensuring 99.9% uptime.

TKGI – Tanzu Kubernetes:

Deployed and maintained Tanzu Kubernetes Grid Integrated Edition (TKGI) clusters, ensuring high availability and scalability.

Configured RBAC policies and managed user access across Kubernetes clusters to align with organizational security standards.

Set up monitoring tools like Prometheus and Grafana to track cluster health and resolve issues related to node failures and workload disruptions.

Redis Enterprise Ops:

Managed Redis Enterprise clusters across production and staging environments, optimizing configurations for high availability and performance.

Automated backup and failover processes, reducing potential downtime during disaster recovery by 70%.

Created automation scripts for TLS configuration updates and resolved cross-region replication issues post-OS patching.

Senior Site Reliability Engineer

Charles Schwab & Co. Nov 2021 – Oct 2024

Cloud Operations and Automation:

Managed large-scale Kubernetes deployments (EKS) and orchestrated microservices architectures, enhancing scalability and fault tolerance.

Created Terraform modules for AWS resources (EKS clusters, IAM policies, EMR clusters) and utilized Terraform workspaces for different business lines.

Chaos Engineering:

Implemented Chaos Monkey within AWS and Kubernetes environments to simulate failure scenarios, reducing downtime by 15%.

Automated monitoring and reporting of failure recovery using Grafana dashboards, contributing to SRE best practices.

Disaster Recovery:

Created AWS Automated backups for DynamoDB and RDS; restored backups in cross-account and cross-region during migrations.

Configured AWS ARC controller and created resource sets in a centralized account.

Designed and executed DR strategies for AWS global databases (RDS), automating region failovers and route switching during incidents.

Reduced Mean Time to Recovery (MTTR) by scripting automated ECS container failovers between regions.

Monitoring and Observability:

Integrated applications running in AWS ECS with AWS OTEL collector and used it as a data source in Amazon Managed Grafana and AWS X-Ray.

Site Reliability Engineer

Cambridge Mobile Telematics, Cambridge, MA Jun 2021 – Nov 2021

Description:

CMT is the world’s largest smartphone telematics provider, powering 79 programs around the globe with leading insurers, automotive manufacturers, and mobile network operators. Using mobile sensing and IoT, machine learning, and behavioral science, CMT’s telematics platform measures driving behavior to empower driver improvement, provides instant crash alerts and roadside assistance, and creates a smooth connected claims process to reduce costs and improve efficiency.

Responsibilities:

Participated in on-call rotations in a team of 8 members.

Deployed Secrets Manager secrets with COGNITO app clients to enable SSO login to portal applications (Webapp internal to CMT).

Terraformed the existing IAM policy scripts using Terraform AWS providers and IAM resources.

Imported existing manually created resources to Terraform state.

Worked with application teams to use Terraform for deploying CodeBuild projects and CodePipelines for ECS containers.

Deployed ECR repos and ECS clusters using Terraform and CloudFormation.

Set up infrastructure required for the ECS clusters including subnets, NACLs, Security Groups, CloudWatch alarms, IAM roles, and repository permissions.

Created Jira cards and integrated them with Bitbucket; set up pull-request-based pipelines.

Developed and consumed REST services with Spring Boot.

Used YAML and JSON files extensively.

Created new modules as needed to support application teams’ architectural needs.

Integrated PagerDuty with Jira to generate tickets for Severity Level 1 alerts.

Documented RCA for solved incidents and “how-to” guides for Day 2 operations.

Troubleshot certificate, DNS, and container metrics issues.

Tools & Environment: Cognito, AWS Terraform, Bitbucket, JSON, Spring Boot, AGILE, ECS, Jira

Platform Engineer

The Vanguard Group Inc, Malvern, PA Jan 2020 – Jun 2021

Description:

Vanguard is the largest provider of mutual funds and the second-largest provider of ETFs in the world, offering brokerage services, annuities, educational accounts, financial planning, asset management, and trust services.

Responsibilities:

Acted as SME for PCF platform (32 foundations across 4 regions).

Participated in on-call rotations; led platform upgrades and network changes for PCF and AWS ECS.

Changed VM types to resolve disk-related issues and created custom VM types for PCF provisioning.

Deployed PCF foundations across vCenter and cloud CPIs; automated platform tasks with Concourse.

Scripted Bash for daily operations; automated certificate renewal for custom domains and platform certs.

Drove PCF upgrades (1.10 2.9) and quarterly admin-password rotations.

Configured SSL termination at load balancers and rotated certificates for routers.

Troubleshot ECS container memory/CPU issues and AWS API limits.

Participated in major incident calls and documented processes in Confluence; tracked backlogs in Jira.

Deployed AWS dashboards via CloudFormation and Bamboo; led hackathon for cert renewal automation.

Managed vendor relationships (Pivotal, Amazon) for feature requests and upgrades.

Assisted app teams migrating from PCF to ECS, building custom Docker images, and automating deployments.

Updated AWS security groups to maintain enterprise compliance; troubleshot cross-region connectivity via VPC flow logs.

Tools & Environment: PCF Platform, AWS ECS Platform, SSL Termination, GitHub, Bitbucket, Diego-cells, Bamboo, VPC

DevOps Engineer

Ford Motor Company, Dearborn, MI Sep 2017 – Dec 2019

Responsibilities:

Participated in a DevOps environment leveraging emerging cloud technologies.

Scaled platforms manually and automated upgrades.

Troubleshot application-team issues and identified root causes.

Implemented server automation, event management, and platform (PaaS) administration.

Maintained, deployed, and configured VMs on-prem and in Azure/AWS/vSphere.

Built out new PCF foundations off-prem (Azure US/CN) and on-prem (vSphere) using BOSH.

Developed scripts to promote automation; rotated credentials in CredHub quarterly.

Created CI/CD pipelines for deployment automation; performed BOSH-level troubleshooting of Diego cells and cloud controllers.

Deployed PCF on AWS; managed isolation-segment replication per application requirements.

Automated certificate deployments into HA Proxies using Concourse; deployed Vault for key management.

Tools & Environment: PaaS, Cloud IaaS (Azure/AWS/vSphere), CI/CD pipelines, BOSH, Concourse, Vault

DevOps Engineer

Yaska Technologies Pvt Ltd, Hyderabad, TG, India Feb 2014 – Dec 2015

Description:

Yaska Technologies provides consulting services, custom software solutions, and data analytics for government, airport, and utility clients worldwide.

Responsibilities:

Provided configuration-management and build support for multiple applications in production and lower environments.

Defined and implemented CM and release-management processes, policies, and procedures.

Used Ansible to manage web-app configurations, mount points, and packages.

Configured Ansible and Puppet modules for OpenStack deployment.

Authored and maintained build scripts in ANT, Python, and shell; migrated Ant scripts to Maven.

Managed Maven repositories and deployed snapshot/release artifacts to Nexus.

Configured and maintained Jenkins for CI, integrating with Ant and Maven.

Collaborated with development, testing, deployment, and infrastructure teams to ensure continuous build/test operations.

Worked on Oracle databases to execute DML/DDL for build systems.

Tools & Environment: ANT, Python, Ansible, Puppet, Maven, Jenkins, Oracle Databases

DevOps Engineer

Info care Softech Pvt Ltd, Hyderabad, TG, India Apr 2012 – Jan 2014

Description:

Info care Softech Pvt Ltd provides 24 7 support via its Global Operations Command Center and offers consulting services to reduce infrastructure costs and improve service levels.

Responsibilities:

Developed and supported software release management procedures.

Performed Subversion (SVN)/CVS support: branching, tagging, merging.

Coordinated release schedules with project managers.

Automated builds with Ant and Maven; managed Maven repositories via Nexus.

Documented release procedures and maintained continuous integration using Anthill Pro.

Created and distributed release notes; tracked status and scheduling issues.

Used ClearQuest for ticketing.

Tools & Environment: SVN, CVS, Perforce, Ant, Maven Scripts, SQA, Anthill Pro, ClearQuest

Education and Certifications

Master’s In Computer Science and Information Wilmington University 2017 GPA 3.62

AWS Solutions Architect Associate & ProfessionalAutomated build processes using Ansible, Maven, and Jenkins, improving efficiency and reproducibility.

Configured Ansible and Puppet modules for OpenStack deployment and managed virtual/physical instance provisioning.

DevOps Engineer

Infocare Softtech Pvt Ltd Apr 2012 – Jan 2014

Education and Certifications

Master’s in Computer Science and Information Technology, Wilmington University GPA: 3.62 2017

AWS Certifications:

oAWS Solutions Architect Associate

oAWS Solutions Architect Professional

Key Achievements

Reduced manual intervention in platform maintenance by 40% through Concourse pipeline automation.

Improved incident response time by 50% using Amazon Managed Grafana for Redis monitoring.

Automated disaster recovery processes, reducing MTTR significantly during regional failovers.

Enhanced system reliability by integrating Chaos Monkey with Kubernetes and AWS environments.



Contact this candidate