Shashank Chekatla
Email: ***********@*****.***
Phone: +1-678-***-****
SR. DevOps Engineer SRE Engineer
DevOps – Expert and AWS Developer, SysOps, Solution Architect with 9 Years of IT diverse experience in all phases of SDLC involving Analysis, Design, Development, Deployment, Testing, and Implementation of Client/Server applications using Java, Python Microsoft .NET, C#, based applications. Configuration Management, Continuous Integration and Continuous Deployment (CI/CD), and integration like Amazon Web Services (AWS), Azure, PCF, and Open Stack under the Linux &Windows Platform with strong knowledge of principles and best practices of Software Configuration Management (SCM) in Agile, Scrum and Waterfall Methodologies.
PROFESSIONAL SUMMARY:
●Experienced in setting up database instances in Amazon Web Services (AWS) using RDS, DynamoDB, storage using S3 bucket and configuring instance backups and archives to Amazon Glacier archives and configured AWS Redshift for Data Warehousing.
●Experienced in building sophisticated and highly automated infrastructure as code using Cloud Formation and Terraform automation frameworks in cloud platform GCP.
●Built CI/CD pipeline using Google Cloud Build and Cloud Source Repositories in both cloud and On-Premises with Docker, Maven, and Git along with Jenkins pipeline builds using YAML/files.
●Hands-on experience in GCP services like EC2, S3, RDS, VPC, SNS, EBS, ELB, Cloud watch, IAM, and R53.
●Creating multiple Dynatrace dashboards for different applications/Infrastructure with specific level of metrics.
●Good understanding in the various IAM modules such as Identity Management, Identity Governance, Access Management and Life Cycle Management
●Worked with IAM/ Security team in managing technologies such as vulnerability assessment tools, identity and access management, web content filtering, VPN / two factor authentication planning and solutions.
●IAM experience including configuration, administering and installation of IAM suite and also worked on CA SiteMinder.
●Experienced on various Google Cloud Platform (GCP) services like Compute Engine (VM’s), Big Query, Cloud Identity and Access Management (IAM), App Engine (Web Roles), Cloud SQL, Cloud Storage, Virtual Private Cloud (VPC), Cloud Load Balancing, Auto Scaling, and Cloud Functions.
●Performed regular Deep dive into application’s performance through available tools like Dynatrace, App Dynamics Splunk.
●Designed and implemented GCP organization setup, project setup, and IAM access and GCP service account setup for development, QA, and production support teams.
●Setting up the Dynatrace agents for application servers and Analyzing issues using the Dynatrace
●Experienced in configuring and deploying Cloud Deployment Manager templates and Cloud Functions for applications utilizing the GCP services including Compute Engine, Cloud Storage, Cloud Datastore, Cloud SQL, Cloud Pub/Sub, Google Kubernetes Engine (GKE), and Cloud IAM, focusing on Automation.
●Performed log monitoring using Dynatrace, Splunk and co related to find out root cause.
●Configured identity and access management (IAM) roles and permissions to ensure secure access control to AlloyDB resources.
●Extensive experience in the implementation of Continuous Integration (CI), Continuous Delivery, and Continuous Deployment (CD) on various Java based Applications using Jenkins, Ansible, Maven, Git, Nexus, Docker, and Kubernetes.
●Experience in creating alarms and notifications for EC2, Lambda instances using Cloud Watch.
●Administered and optimized AlloyDB instances for high availability and scalability on Google Cloud Platform (GCP).
●Experienced in working with Ansible Tower to manage multiple nodes and inventory for different environments, automated repetitive tasks, and deployed critical applications on cloud using Ansible, and Google Cloud Deployment Manager.
●Implemented automated backup, recovery, and failover strategies using AlloyDB's built-in tools.
●Experience on APM Tools like Dynatrace, AppDynamics and Data Dog.
●Worked on p1 and p2 issues using Dynatrace and Datadog APM.
●Implemented Dynatrace on different cloud technologies like AWS, Azure and GCP.
●Managed GPU-Accelerated Infrastructure, Led the design, deployment, and management of GPU clusters for high-performance computing workloads, ensuring optimal resource allocation and minimal downtime.
●Integrated GPU-based workloads into CI/CD pipelines, automating model training and deployment processes, and reducing deployment times by X%.
●Monitoring & Alerting for GPU Workloads by developing robust monitoring solutions for GPU utilization (using Prometheus, Grafana) to track performance metrics such as GPU memory, utilization, temperature, and power consumption, reducing system failures by X%.
●Implemented audit logging and monitoring for AlloyDB to track user activities and database access.
●Integrated Dynatrace with Service now, Jira and pager duty.
●Integrated AlloyDB with other GCP services (e.g., BigQuery, Dataflow, Pub/Sub) to enable efficient data pipelines and analytics.
●Experienced in working on Chef with Knife commands to manage Nodes, Cookbooks, Chef Recipes, Chef attributes, Chef Templates and used Ruby scripting on Chef automation for creating cookbooks comprising all resources, templates, attributes.
●Configuring the WebSphere application server jvm’s with Dynatrace agents and setting up the Dynatrace setup for the production environment.
●Experienced in building Docker containers using docker compose, automated Docker Image build and pushed those artifacts to Nexus using Jenkins.
●Analyze the time series metric logs for upstream and downstream traffic in Prometheus/Kafka and Grafana Dashboards.
●Experienced in managing on-site OS, Applications, Services, Packages using Chef as well as GCP for Compute Engine, Cloud Storage, Cloud DNS, and Cloud Load Balancing with Chef Cookbooks.
●Experienced in writing Ansible Playbooks for automating CI/CD pipelines, deploying, updating launch configuration for microservices, provisioning load balancers, and auto-scaling instances.
●Provisioned Kubernetes Cluster on AWS, GCP including master, slave, RBAC, HELM, Kubectl, ingress controllers via Terraform foundation modules.
●Integrated New Relic with CI/CD pipelines to automate the monitoring setup for new deployments.
●Proficient in managing DNS configurations, including addressing and name resolution within TCP/IP networks, ensuring seamless communication between services.
●Experienced in creative and effective front-end development using JSP, JavaScript, HTML, JQuery, Angular Js, Bootstrap, React JS, AJAX, CSS.
●Expertise on Shell and Python scripting with focus on DevOps tools. Worked on creating Shell scripts, Ruby, Bash, Groovy, Python and PowerShell for automating tasks.
●Experienced on working with system health and performance Monitoring Tools like Nagios, Splunk, Cloud Watch, Azure Monitor, Google Stack driver, ELK (Elastic search, Log stash and Kibana) stack.
●Deployed HashiCorp Vault in a high-availability configuration to ensure continuous access to secrets.
●Implemented Kubernetes manifest, HELM charts for deployment of microservices into k8s clusters.
●Decommission of AppDynamics with Datadog as replacement.
●Creation of pub-sub topics and subscriptions for user activations
●Retired CentOS image and replaced it with Ubuntu image.
●Experienced in deploying and configuring Git repositories with branches, forks, tags, cloning, labels and merge requests.
●Experience in Installation, Configuration, Management of various flavors of Linux (RHEL, CentOS, and Ubuntu), and built customized infrastructure for multiple applications on cloud, on-premises, and established connectivity between all the resources.
●Extensively worked on Jenkins for Continuous Integration (CI) from pulling code from version control tools like GIT, Subversion, and building deployable artifacts from source code using build tools like Ant, Maven and used artifacts repository managers like Nexus, JFrog for storing the builds.
●Worked on Argo CD setup for the applications by making helm changes and creating yaml files
●Complete understanding of Software Development Life Cycle (SDLC) with Agile Methodologies.
●Responsible for automation and orchestration of cloud services offerings on AWS.
●Design utilities using .NET Framework that would run through Azure release pipeline.
●Proficient in deploying and managing PHP applications in cloud environments such as AWS, Azure, and GCP.
●Experienced in deploying Puppet, Puppet Dashboard and Puppet DB to existing infrastructure and created puppet modules for Protocols configuration and managing them using Puppet automation.
●Integrated HashiCorp Vault with CI/CD pipelines to fetch and inject secrets during the deployment process.
●Experience in setting up JIRA as defect tracking tool and configured various plugins, workflows, customizations for JIRA bug tracker. Involved in planning and execution of the migration from Bugzilla-based bug-tracking and Jenkins CI tool into JIRA.
●Service account creation and assigning required permissions using Terraform Scripts.
●I am proficient in scripting languages like Shell, Python, Ruby, PowerShell, Bash, YAML, Groovy scripts.
●Expert in using source code version tools like Subversion, Git on LINUX and Windows environment.
●Used databases like RDS, MySQL, and DynamoDB to perform basic database administration.
●Involved in Day-to-day administration of the environment systems like Development, Production and Test and provided 24x7 system on-call support.
EDUCATION:
Bachelors in computer science – JNTUH, India.
TECHNICAL SKILLS:
Cloud Technologies
Google Cloud Platform (GCP), AWS, Microsoft Azure, RedHat OpenStack
Platforms
C, C++, C#, Java/J2EE
Scripting
Shell, Python, Ruby, PowerShell, YAML, Groovy, Bash
Web Technologies/Frameworks
HTML, JSP, JSTL, JavaScript, CSS, Servlets
Version Control Tools
GIT, SVN, GitLab, GitHub, Bitbucket
Middleware Tools
WebSphere Application Server, WebSphere Commerce server, TC Server, Jboss IIS, Apache Tomcat, Tibco, and BO
Build Tools
Ant, Maven
Configuration Management
Terraform, Chef, Puppet, Ansible
Continuous Integration Tools
Jenkins
Ticketing Tools
JIRA, Bugzilla and Confluence
Monitoring Tools
Nagios, Splunk, Cloud Watch, ELK Stack, Dynatrace, Datadog, AppDynamics
Artifactory Repositories
Nexus, JFrog
Methodologies
Agile, Waterfall
Operating Systems
Unix/Linux (Red Hat, CentOS, SUSE), Solaris, Ubuntu, Windows
Databases
Oracle, MS SQL Server, MySQL, DynamoDB, MongoDB, NoSQL, PostgreSQL, AlloyDB
Virtualization
Virtual Box, VMWare, Windows Hyper-V
Containerization tools
Docker, Docker Swarm, OpenShift, Kubernetes
Work Experience:
Equifax, St. Louis, MO
Site Reliability Engineer (GCP) Sep 2022 – Jan 2025
Roles & Responsibilities:
●Integrated Terraform with GCP Cloud Build for continuous integration and continuous deployment (CI/CD) pipelines, enabling automated infrastructure deployments and updates.
●Integrated GitOps workflows with continuous integration and continuous deployment (CI/CD) pipelines, enabling automated builds, testing, and deployments based on Git events.
●Implemented Terraform modules and compositions for building reusable and modular GCP infrastructure components, enabling efficient resource provisioning and maintenance.
●Designed Google Cloud Deployment Manager templates to create custom Virtual Private Cloud (VPC), subnets, and Cloud NAT to ensure successful deployment of Web applications and database templates.
●Worked on Terraform for provisioning of Environments in GCP platform.
●Contributed to the development and maintenance of Git repositories containing Infrastructure-as-Code (IaaC) configurations, Kubernetes manifests, and application source code.
●Created Cloud Functions and assigned roles in Google Cloud Functions to run Python scripts and utilized Google Cloud Functions to perform event-driven processing. Configured Cloud Functions jobs and roles using Google Cloud SDK.
●Responsible for managing the GCP services such as Compute Engine, App Engine, Cloud Storage, VPC, Load Balancing, Big Query, Firewalls, Log Analytics, Splunk and Stack Driver.
●Performed regular Deep dive into application’s performance through available tools like Dynatrace,App Dynamics Splunk.
●Setup GCP Firewall rules to allow or deny traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency.
●Worked on Terraform scripts and have written various modules for spinning up VMs, buckets… etc.
●Responsible for Deploying Artifacts in GCP platform by using Packer.
●Good understanding in the various IAM modules such as Identity Management, Identity Governance, Access Management and Life Cycle Management
●Worked with IAM/ Security team in managing technologies such as vulnerability assessment tools, identity and access management, web content filtering, VPN / two factor authentication planning and solutions.
●IAM experience including configuration, administering and installation of IAM suite and also worked on CA SiteMinder.
●Retired the CentOS image and replaced it with Ubuntu image.
●Experience on APM Tools like Dynatrace, AppDynamics and Data Dog.
●Created and maintained the Datadog alerts for the UAT and Prod for all the applications worked by the team
●Performed log monitoring using Dynatrace, Splunk and co related to find out root cause.
●Implemented Terraform configurations for deploying and managing Google Kubernetes Engine (GKE) clusters, including node pools, networking, and authentication/authorization configurations.
●Written queries in Dynatrace RUM to pull User experience data.
●Implemented robust branching strategies, such as Git Flow, Trunk-Based Development, and GitHub Flow, aligning with project requirements and team workflows.
●Worked on GKE Topology Diagram including masters, slave, RBAC, helm, kubectl, ingress controllers.
●Created projects, VPC's, Subnetwork's, GKE Clusters for environments QA3, QA9 and prod using Terraform Created projects, VPC's, Subnetwork's, GKE Clusters for environments.
●Used Google Kubernetes Engine (GKE) for deploying and scaling web applications and services developed with Java, PHP, Node.js, Python, and Ruby on familiar servers such as Apache and Nginx.
●Implemented Dynatrace on different cloud technologies like AWS, Azure and GCP.
●Created and deployed VM instances on Google Cloud Platform, managed the virtual networks to connect all servers, and designed, deployed Infrastructure-as-Code (IaC) applications using Google Cloud Deployment Manager (YAML) templates.
●Implemented GCP Cloud Armor for Web Application Firewall (WAF) capabilities, protecting applications from common web vulnerabilities and distributed denial-of-service (DDoS) attacks.
●Configured Google Kubernetes Engine (GKE) for deploying and orchestrating containers by defining tasks and services. Leveraged Blue-Green deployment by developing Ansible playbook to change configuration of services to ramp up or down the number of Tasks running in the overall cluster.
●Automated build and operational tasks using Python and Cloud Shell scripts.
●Provisioned the highly available Compute Engine Instances using Terraform and Cloud Deployment Manager and wrote new Python scripts to support new functionality in Terraform.
●Deployed Prometheus with Grafana to monitor the Kubernetes cluster and configured alerts firing when various conditions met. Also had setup Nginx Ingress controller to manage ingress/egress routing rules for Kubernetes.
●Developed and deployed BigQuery Machine Learning (BQML) models for predictive analytics and advanced data insights directly within the data warehouse.
●Created Clusters using Google Kubernetes Engine (GKE) and worked on creating many pods, replica sets, services, deployments, labels, and health checks using YAML files. Developed automation of Kubernetes Clusters via playbooks in Ansible.
●Worked on p1 and p2 issues using Dynatrace and DataDog APM and Worked setting up Dynatrace RUM.
●Creation of pub-sub topics and subscriptions for user activations
●Configured BigQuery security and access controls using IAM roles, data encryption, and row-level security for secure and compliant data access.
●Worked on Argo CD setup for the applications by making helm changes and creating yaml files
●Integrated Dynatrace with Service now, Jira and pager duty.
●Utilized Terraform for provisioning and managing GCP services like BigQuery, Cloud Dataflow, and Cloud Dataproc for building data engineering and analytics pipelines.
●Designed and deployed GitOps workflows using ArgoCD, enabling automated sync and reconciliation of desired state across multiple environments.
●Implemented security and compliance controls within Terraform configurations for GCP resources, including resource labeling, service account management, and integration with GCP Security Command Center.
●Service account creation and assigning required permissions using Terraform Scripts.
●Designed and implemented Google Cloud Virtual Private Networks (VPN) with subnets & Firewall rules. Configured Google Cloud Security using IAM, Google Cloud Security Command Center, and Google Cloud Monitoring Services.
●Decommission of AppDynamics with Datadog as replacement.
●Wrote Python, Ruby, Bash scripts to monitor the installed enterprise-level applications and manage the configurations of multiple servers using Ansible.
●Utilized GCP Cloud Data Loss Prevention (DLP) for identifying and redacting sensitive data, ensuring data privacy and regulatory compliance.
●Implemented security best practices for GCP deployments, including resource hierarchy, separation of concerns, least privilege access, and centralized audit logging.
●Used Dynatrace API’s to create dashboard and reports for a few applications.
●Worked on Jenkins file with multiple stages like checkout a branch, building the application, testing, pushing the image into GCR, Deploying to QA3, Deploying to QA9, Acceptance testing and finally Deploying to Prod
●Configured GCP Virtual Private Cloud (VPC) Service Controls for secure data ingress and egress, enabling controlled access to sensitive data stores.
●Created and managed IAM policies for Cloud Storage buckets, utilized Cloud Storage for data storage, and moved data to Cloud Storage Nearline for archival storage.
●Involved in setting up Autoscaling of instance groups using Google Cloud SDK (command line interface) and implemented for Google Cloud environments such as Production/Development/Testing environments.
●Experienced in creative and effective front-end development using JSP, JavaScript, HTML, JQuery, Angular Js, Bootstrap, React JS, AJAX, CSS.
●Deployed Ansible playbooks in GCP environment using Google Cloud Deployment Manager as well as created Ansible roles using YAML. Used Ansible to configure Apache servers and their maintenance.
●Used Ansible playbooks to setup Continuous Delivery (CD) pipeline which primarily consists of Jenkins to run packages and its supporting software components which are from Maven build tool.
●Installed and configured Splunk, Datadog, Nagios to monitor applications deployed on the application server, by analyzing the application and server log files. Worked on the setup of various dashboards, reports.
Environment: GCP, Kubernetes, Docker, Jenkins 2.0, Apache, Nginx, Ubuntu Linux, VMware ESX, Python, Git, Gitlab, Bash, Ruby, Groovy, Yaml, SonarQube, Terraform, Ansible, Agile/Scrum, SDLC.
CAPITAL ONE, VA Jan 2021 – Aug 2022
Sr. DevOps Engineer/SRE
Description: As an AWS DevOps Engineer at Capital One, I played a key role in streamlining the CI/CD pipeline by configuring Jenkins for build, static and dynamic scans, and integrating SonarQube for code quality analysis. Automated AWS services using Python scripts implemented Kubernetes best practices and contributed to continuous improvement sessions. Managed Git repositories, branches, and merges, and orchestrated a robust Continuous Delivery pipeline with Git, Jenkins, Docker, and AWS AMI, enhancing the efficiency of the development lifecycle. Setting up the Dynatrace agents for application servers and Analyzing issues using Dynatrace.
Roles & Responsibilities:
SRE for a team that involved different development teams and multiple simultaneous application/software releases Managed Services like EC2, S3 Bucket, Route53, ELB, EBS, etc.
Involved in Architecting, building, and maintaining highly available secure multi-zone AWS cloud infrastructure utilizing Chef with AWS CloudFormation and Jenkins for continuous integration.
Created Three-tier Architecture WAF, WEB, and APP layers using CloudFormation templates Deploy and monitor scalable infrastructure on Amazon web services (AWS) & configuration management using Chef.
Worked on migrating a current application to microservices architecture. This architecture included Docker as the container technology with Kubernetes.
Expertise in creating builds using Shell Scripts, ANT/MAVEN scripts manually and automated.
Ability in the development and execution of XML and Perl Scripts. Proficiently used JIRA and Confidential Service Manager tools to track all the defects and changes related to Build and Release.
Utilized Kubernetes for the runtime environment of the CI/CD system to build, and test deploy.
Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, custom-built designing cloud-hosted solutions, and specific AWS product suite experience.
Setting up the Dynatrace agents for application servers and Analyzing issues using the Dynatrace
Creating multiple Dynatrace dashboards for different applications/Infrastructure with specific levels of metrics.
Coordinated with different project teams, to build and release planning efforts for a better view of the release process and policies for projects early in SDLC.
Responsible for nightly and weekly builds for different modules. Build and deployed java applications in different environments.
Developed robust monitoring solutions for GPU utilization (using Prometheus, Grafana) to track performance metrics such as GPU memory, utilization, temperature, and power consumption, reducing system failures by X%.
Led the design, deployment, and management of GPU clusters for high-performance computing workloads, ensuring optimal resource allocation and minimal downtime.
Integrated GPU-based workloads into CI/CD pipelines, automating model training and deployment processes, and reducing deployment times by X%.
Created Ansible playbooks to install & configure Kafka to replicate topics across a multi-AZ cluster.
Installed Zookeeper on Kafka and configured it to maintain a checkpoint of how much stream is processed through Kafka.
Administered and optimized AlloyDB instances for high availability and scalability on Google Cloud Platform (GCP).
Good understanding in the various IAM modules such as Identity Management, Identity Governance, Access Management and Life Cycle Management
Worked with IAM/ Security team in managing technologies such as vulnerability assessment tools, identity and access management, web content filtering, VPN / two factor authentication planning and solutions.
IAM experience including configuration, administering and installation of IAM suite and also worked on CA SiteMinder.
Perform ongoing monitoring, automation and refinement of data engineering solutions prepare complex SQL views, stored procs in azure SQL DW and Hyperscale.
Loaded different files from ADLS by using U-SQL scripts into target Azure Data warehouse.
Good at Manage hosting plans for Azure Infrastructure, implementing & deploying workloads on Azure virtual machines (VMs).
Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure VNets and subnets.
Replicate VMware VMs to Azure with Site Recovery.
Creating and Managing Virtual Machines in Windows Azure and setting up communication Network Security Groups.
Extensive experience on deploying compute and storage in Azure cloud.
Extensive experience implementing software throughout the SDLC process, deep hands-on experience of networking, migration and implementation in Azure.
Configured private and public facing Azure load balancers etc.
Exposed Virtual machines and cloud services in the VNets to the Internet using Azure External LoadBalancer.
Scripting experience in Python, PowerShell, Groovy, and Ruby for automation purposes.
Implemented Rad builds in Anthill Pro and automated build, deploy process.
Wrote Chef cookbook, recipes to automate the installation of applications, middleware infrastructure like Apache Tomcat and configuration tasks for new environments, etc, and used Test-Kitchen for testing and developing cookbooks in DEV.
Configuring the WebSphere application server jvm’s with Dynatrace agents and setting up the Dynatrace setup for the production environment.
Implemented MAVEN builds to automate artifacts like jar, war and ear and implemented continuous integration using tool Jenkins.
Good knowledge in managing Sonatype Nexus/artifactory repositories for the maven artifacts and dependencies.
Update Jenkins pipelines and OpenShift templates to make use of the new environment.
Manage automation playbooks and documentation related to OpenShift.
Installed and configured Ansible Tower with multi-AZ configuration across multiple environments backed up by RDS for state preservation.
Applied fine-grained identity and access management (IAM) policies to restrict GPU usage to authorized users and roles, strengthening system security and compliance.
Implemented automated backup, recovery, and failover strategies using AlloyDB's built-in tools.
Integrated AlloyDB with other GCP services (e.g., BigQuery, Dataflow, Pub/Sub) to enable efficient data pipelines and analytics.
Floor stand-ups- leading meetings to take updates on the project. Recognizing team members for their performance.
Build CICD pipelines using Azure DevOps and Jenkins and deploy docker containers to Kubernetes.
Configured LDAP and maintained organizations within the tower to support multiple teams and maintain segregation of duties across prod and non-prod environments.
Direct and lead all build activities towards automation and helping the team in every possible way.
Configured callbacks to Ansible tower to provision machines during auto-scaling.
Administered and Engineered Jenkins for managing weekly Build, Test and Deploy chain, SVN/GIT with Dev/QA/Prod Branching Model for weekly releases.
Configured identity and access management (IAM) roles and permissions to ensure secure access control to AlloyDB resources.
Installed Ansible Registry for local upload and download of Docker images and even from Docker hub.
Responsible for handling the performance issues of applications deployed in the AWS environment.
Rehydrate AMIs every 30 days in DEV/QA/Prod environments to stay in compliance.
Implemented audit logging and monitoring for AlloyDB to track user activities and database access.
Configured Cloud Watch Alarms for AWS resources like EC2, ELB, RDS, etc.
Used GitHub as source code repositories and managed Git repositories for branching, merging, and tagging, and analyzed and resolved conflicts related to merging of source code in GIT.
Created Jenkins jobs to build infrastructure and configure Route53 record sets by utilizing Cloud formation Nested Stacks.
Created Lambda Function for RDS Database replication from East to West and created Lambda Function to stop RDS instances during off-hours and start during business hours.
Installed Filebeat and Configured ELK for monitoring applications in DEV/QA/PROD and configured ELK Visualizations and Dashboards and ELK email alerts.
Initiating and leading the calls.
Created JIRA intake request pages for multiple teams across LOB.
Created MBM plans for application/software releases in the Production Environment.
Manage the SLA/SLO tracking for all the applications.
Worked with the DEV team on building a full CI/CD pipeline on Jenkins and configured Git with Jenkins and scheduled jobs using the Poll SCM option.
Wrote PowerShell script to fully automate the installation of an application without human interpretation
Building servers using GCP, importing volumes, launching EC2, and RDS, and creating security groups.
Created ECS Container Clusters, Target Groups, and ALBs using Terraform.
Creating the SLA for the process.
Supported and helped the new hires with setting up the access.
Responsible for providing analysis of problems and resolutions or fixes for the production issues related to the Splunk platform within SLA.
Proactively Monitor production SLO for the products.
Monitor application performance to ensure it meets SLO.
Developed SLA/SLO reports relating to systems availability related to monitoring.
Developed and supported the Red Hat Enterprise Linux-based infrastructure in the cloud environment.
Provided 24/7 on-call support on Linux Production Servers. Responsible for maintaining security on RedHat Linux.
Environment: AWS, GCP, Azure, Kubernetes, Docker, Jenkins 2.0, ELK, EBS, Jira, Apache, GPU Clusters, Nginx, Nexus, UDeploy, Ubuntu Linux, Git, Gitlab, Python, Ruby, Groovy, Yaml, Chef, Terraform, ANT, Maven, Jira, Agile/Scrum, SDLC.
Change Health Care, GA. Dec 2018 to Dec 2020
Sr. DevOps Engineer / SRE
Roles & Responsibilities:
Worked on installation, configuration, and maintenance of Debian/Redhat, CentOS, and use Servers at multiple Data Centers.
Configured RedHat Kickstart for installing multiple production servers.
Installation,