SUDHAKR BATHINI
Sr. Aws DevOps Lead Site Reliability Engineer
Mail: **********@*****.*** Ph. No: +1-279-***-****
LinkedIn: www.linkedin.com/in/bathinisudhakar
Professional Summary
Over 12+ years of experience, including 8+ years as a Cloud DevOps and Site Reliability Engineer, with expertise in Build and Release Engineering across Red Hat Linux, CentOS, and Ubuntu environments. Extensive experience in cloud computing with AWS, specializing in DevOps practices such as Continuous Integration (CI) and Continuous Deployment (CD) using tools like Jenkins, AWS DevOps, Ansible. Additionally, 4+ years of hands-on experience as an application support.
Experience in AWS Services EC2, IAM, Subnets, VPC, Cloud Formation, S3, SNS, RedShift, CloudWatch, SQS, Route53, ECR, EKS, Lambda, Kinesis and RDS and obtaining High Availability and Fault Tolerance for AWS EC2 instances utilizing services like Elastic IP, EBS and ELB.
Experienced in AWS networking and computer services, including VPC, EC2, AWS Lambda, IAM Role-Based Access Control (RBAC), Amazon RDS (PostgreSQL/MySQL), Amazon Elastic Container Registry (ECR) for private Docker images, and Amazon EKS for Kubernetes-based deployments
Implemented a CI/CD pipeline with Docker, Jenkins, GitHub and AWS EKS, whenever a new TFS/GitHub branch gets started, Jenkins, our Continuous Integration (CI) server, automatically attempts to build a new Docker container from it.
Worked with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.
Experience in Converting Existing Terraform modules that had version conflicts to utilize CloudFormation templates during deployments, worked with Terraform to create stacks in AWS, and updated the Terraform scripts based on the requirement on regular basis.
Experienced in AWS IaaS and PaaS, including provisioning EC2 instances, managing Amazon EBS volumes, configuring VPCs and subnets, deploying web applications on AWS Elastic Beanstalk and AWS Lambda, and automating background tasks with AWS Batch and Step Functions
Experience in cloud automation and orchestration framework using AWS. Implemented multi-tier application provisioning in OpenStack cloud, integrating it with ansible, migrating the application using MAVEN as build tool.
Highly motivated and committed DevOps Engineer experienced in Automating, configuring and deploying instances on AWS cloud environments and Data centers.
The expertise in PLM systems, data management, and process optimization allows organizations to maintain product integrity, reduce time-to-market, ensure compliance, and enhance overall product quality.
Experience in deploying Kubernetes Cluster on AWS cloud environments with master architecture and wrote many YAML files to create many services like pods, deployments, auto-scaling, load balancers, labels, health checks, Namespaces, Config Map etc.
Experience in integrating Jenkins with various tools like Maven (Build tool), Git (Repository), SonarQube (code verification), Nexus (Artifactory) and implementing CI/CD automation for creating Jenkins pipelines programmatically architecting Jenkins Clusters.
Experience in working on version control systems like GIT and used Source code management client tools like Git Bash, GitHub, Git GUI and other command line applications.
Experience of ELK architecture and its implementation. Handled installation, administration and configuration of ELK stack on AWS and performed Log Analysis.
Experience in deploying and configuring Elasticsearch, Log Stash, Kibana (ELK) and AWS Kinesis for log analytics and experienced in monitoring servers using Nagios, Splunk, CloudWatch.
Worked on customizing Splunk dashboards, visualizations, configurations, reports and search capabilities using customized Splunk queries.
Expert in writing scripts like JSON, java, YAML, Groovy, Bash/Shell for automating the build and release process.
Exposure to all aspects of Software Development Life Cycle (SDLC) such as Analysis, Planning, Development, Testing, Implementation, Post-production analysis of the projects.
Knowledge of the Java ecosystem, including Core Java, Java EE (JSP, Servlets), Spring Framework (Spring Boot, Spring MVC), Hibernate, and microservices architecture. Experienced in Project analysis, gathering user requirements, technical design.
Implemented traffic splitting and retries policies within Istio, improving service uptime and reducing error rates across distributed applications.
Integrated Data log logic into cloud-native platforms using tools like Open Policy Agent (OPA) and Rego (Data log-inspired language). Highly organized, detailed oriented, able to plan, prioritize work and meet deadlines.
Applied Data Log in security policy enforcement, compliance validation, and infrastructure governance in cloud environments (Aws/Azure).
Ability to work directly with all levels of management to gather user requirements.
Experience in SDLC like Waterfall, Safe Agile and DevOps methodologies.
Education Details:
Bachelor of Technology from Jawaharlal Nehru Technological University.
Technical Skills
Cloud Environments
Microsoft Azure, Amazon Web Services (AWS)
Version Control Tools
GIT, GITHUB, GITLAB,
CI/CD Tools
Jenkins, Circle CI
Build tools
Maven, SonarQube, Gradle
Automation Tool
Ansible, Terraform
Monitoring Tools
Nagios, Splunk, ELK, CloudWatch, Prometheus, Grafana.
Container Tools
Docker, Kubernetes, Mesos, OpenShift, Aws ECS.
Bug Tracking Tools
JIRA, Remedy, HP Quality Center, IBM Clear Quest.
Database
My SQL, MS SQL, Oracle, Dynamo DB, Mongo DB 7, SQL Server
Scripting & Programming Languages
Shell Scripting, java, XML, PL/SQL, HTML, java, Groovy
Application/Web Servers
Web logic, Web sphere, Apache Tomcat, Nginx,
Operating Systems
Solaris, CentOS, UBUNTU and RHEL.
Professional Experience:
Sr AWS DevOps Lead
TSC (Tractor supply co.) USA, INDIA Aug 2024 to Present
Responsibilities:
Working on various AWS services like EC2, Auto Scaling, S3, Elastic Load Balancing, Route53, AWS Aurora, RDS, VPC, Cloud Watch, IAM, EKS, EMR and Lambda.
Source code management using GitHub repository to trigger Jenkins Jobs for software delivery process as CI/CD.
Utilized Maven and NPM to create software packages that are optimized for deployment, accomplish tests, and compile source code.
Creating Jenkins pipelines and setting up Jenkins agents to automate the deployment of applications to on-premises servers and Amazon EC2 instances, ensuring dependable and consistent releases.
Used Maven build tool plugin for java related projects, managed artifacts in JFrog Artifactory.
Enabled scanning on the registry to scan vulnerabilities in Docker images on Prod environments.
Automated OS patching and user management tasks on instances by creating Ansible playbooks.
Utilizing Terraform modules to create and modify the infrastructure on the AWS cloud & Azure Cloud.
Written Docker files as per requirement and built the Docker images and uploaded them to the ECR.
Created AWS Lambda Deployment function and configured it to receive events from AWS S3 buckets.
Expertise in writing the Kubernetes (EKS, AKS) manifest files for deployments, pods, secrets, and services.
Deploying the application version on the Kubernetes (K8s) environment by following the Blue/Green and Rolling deployment strategies based on environments.
Deployed application into EKS AWS managed K8s services and maintained high availability of workload by creating Deployments and Replica sets for Production environment.
Developed serverless applications using AWS Lambda with Java, implementing business logic in response to AWS services such as S3, DynamoDB, and SNS.
Integrated AWS Lambda with API Gateway for creating RESTful APIs to reduce infrastructure overhead.
Implementing and managing service mesh using ISTIO for traffic routing, and Load balancing.
Configured Istio controller and Envoy proxy on Kubernetes to expose the services.
Automated the deployment of new releases of applications to K8s using Argo CD.
Written Groovy, and shell Scripts to automate the build process and administration jobs.
Worked on Helm package manager to write Helm charts deploy reusable manifest files in K8s.
Installed, configured, and maintained Apache/Nginx Web Server on Ubuntu instances.
Monitoring production errors and providing the RCA by analyzing the Splunk logs.
Setup Dynatrace monitoring across servers and various AWS services as alert manager.
Experience with leading PLM platforms (Teamcenter) and knowledge of how to configure and implement them.
Experience in deploying microservices based applications.
Knowledge of scripting languages (Shell scripting, Groovy) for automation tasks and deployment.
Experience with Jenkins, GitLab or other CI/CD tools for automating build and deployment pipelines.
Proficiency in Git and version management systems for maintaining code and configuration repositories.
Worked closely with DevOps and security teams to translate business policies into logic rules for continuous validation and monitoring.
Developed custom inference engines using Data log to support authorization logic, role hierarchies, and rule-based decision systems.
Participating in on-call rotations to maintain system reliability and resolve critical issues as SRE Engineer.
Identify root causes of recurring problems to implement effective solutions.
Implemented observability, monitoring, and alerting solutions by Prometheus and Grafana.
Experienced working on software release projects using the Agile approach and Jira for tasks.
AWS DevOps Engineer
HDFC BANK, India Feb 2022 to Aug 2024
Responsibilities:
Deployed AWS Solutions using Vm’s, VNet’s, EC2, S3, and EBS, Elastic Load balancer (ELB), auto-scaling groups and OpsWorks.
Planed, deployed, monitored, and maintained Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare virtual machines as required in the environment.
Performed S3 buckets creation, policies on the IAM role based polices and assigned to cloud instances.
Automated the deployment of patches and updates across hundreds of servers using custom Bash scripts, reducing manual intervention and ensuring compliance with security policies.
Implemented a disaster recovery solution utilizing AWS as the primary cloud and GCP as the secondary. This setup ensured business continuity by enabling rapid failover and data recovery in case of an AWS outage.
Designed CI/CD Pipelines to make use of Docker files and Make files for building docker images and validating containers using entry points.
Designed automation scripts for cloud provisioning, deployment pipelines, and configuration management using YAML in collaboration with Terraform and Ansible.
Led the adoption and administration of GitHub Enterprise, enabling streamlined code management and collaboration across multiple development teams. Implemented advanced security protocols, branch protection rules, and access controls to safeguard code integrity and comply with industry standards.
Developed a system for seamless data migration and synchronization between AWS S3 and Google Cloud Storage. This allowed for efficient data handling and accessibility, enhancing the overall data management strategy.
Established a robust code review process using GitHub’s pull request and review features, significantly improving code quality and developer productivity. Integrated static code analysis tools like SonarQube within GitHub to automate code quality checks.
Designed and managed Horizontal Pod Autoscaler (HPA) configurations for high-availability services using YAML.
Automated routine administrative tasks using shell scripts (Bash, Python), such as backups, system monitoring, and log management.
I wrote the ansible playbooks which is the entry point for Ansible provisioning, where the automation is defined through tasks using YAML format. Run Ansible Scripts to provision Dev servers.
Worked on GIT Bucket which includes creating new users, Branching, Merging the changes and writing some pre-commit & post-commit hook scripts.
Involved in setting up JIRA as defect tracking system and configured various workflows, customizations, and plugins for the JIRA bug/issue tracker.
Worked on setting up Splunk to capture and analyze data from various layers of Load Balancers, Webservers.
AWS DevOps Engineer
FEDEX CORP, USA, INDIA Oct 2020 to Jan 2022
Responsibilities:
Worked on AWS Cloud platform and its features which includes EC2, VPC, S3, AMI, SNS, RDS, Cloud Watch, Auto scaling, Cloud Front, IAM, S3 for Configuring and managing IaaS. Wrote CloudFormation scripts to deploy different AWS Infrastructure components related to the respective services, managed various resources and data elements defined in the scripts.
Designed AWS CloudFormation templates (CFT) to create custom sized VPC, subnets, NAT to ensure successful deployment of Web applications and database templates in AWS Cloud.
Worked on CI/CD pipelines using Jenkins to build, test, deploy microservices containers on Kubernetes clusters using Ansible on DEV, UAT, PROD environment.
Skilled in troubleshooting complex YAML structures and ensuring compatibility across diverse cloud environments.
Streamlined multi-environment deployments by creating reusable YAML templates for Kubernetes Config Maps, secrets, and manifests.
Managed Kubernetes charts using Helm, created reproducible builds of the Kubernetes applications, Kubernetes manifest files and releases of Helm packages. Developed CI/CD system with Jenkins on Kubernetes container environment, utilizing Kubernetes and Docker for the CI/CD system to build, test, deploy and configure Kubernetes to deploy scale, load balance, scale and manage Docker containers with multiple names spaced versions.
Automated repetitive infrastructure tasks by developing robust Terraform modules integrated with CI/CD pipelines.
Managed and administered Linux/Unix systems including Red Hat, CentOS, Ubuntu, and SUSE. Tasks included OS installations, upgrades, patching, user account management, and performance tuning.
Designed and implemented a new GitHub-based framework for software delivery processes, enhancing automation, workflow orchestration, and security protocols.
Experience in integrating Terraform with Ansible, Packer to create and Version the AWS Infrastructure, designing, automating, implementing and sustainment of Amazon machine images (AMI) across the AWS Cloud environment.
Implemented Prometheus and Grafana for detailed observability and monitoring of Kubernetes’ workloads.
Deployed and managed enterprise applications on IBM WebSphere Application Server (WAS) and IBM HTTP Server (IHS), ensuring high availability and reliability.
Developed and standardized GitHub Actions and workflows, integrating SonarQube for continuous code quality checks and security scanning.
Successfully integrated Terraform with cloud services to manage multi-cloud environments with dependencies.
Managed deployments of microservices on containers using containerization tools like docker and Docker compose and used Docker swarm for orchestration built into Jenkins for continuous deployments into various environments.
Developed automated workflows in ServiceNow to streamline IT processes and improve efficiency.
Led the automation of build and deployment pipelines, significantly reducing manual efforts and improving release cycles.
Created Ansible playbooks for automation purposes like file copy, permission changes, configuration changes, path specific folder creation, etc. Wrote playbooks for provisioning, orchestration, packages, services, configuration and deployments.
Successfully integrated GitHub EMU with corporate identity providers (like Active Directory, SAML, or LDAP), streamlining the authentication process. This enhanced security by enabling single sign-on (SSO) and automated user provisioning/deprovisioning.
Created custom reports and dashboards in ServiceNow to track key performance indicators (KPIs) and support data-driven decision-making.
Extensively managed GitHub Enterprise, including setting up repositories, branches, permissions, and continuous integration/continuous deployment (CI/CD) pipelines using GitHub Actions.
We Implemented a Lambda function that triggers in real-time upon transaction activities. It quickly analyzes transaction patterns using machine learning models to flag and report potentially fraudulent activities, significantly reducing fraud incidence.
Responsible for installing Jenkins master and slave nodes, configure Jenkins builds for continuous integration and delivery pipelines. Used Jenkins, Build for Continuous Integration and deployment into Tomcat Application Server.
Optimized WebSphere performance through JVM tuning, connection pool management, and thread tuning to handle high-load scenarios effectively.
Transformed a monolithic application into a set of microservices, containerizing each service using Docker for deployment on ECS.
Implemented Splunk for end-to-end application performance monitoring, enabling real-time visibility into application health and performance.
Implemented traffic splitting and retries policies within Istio, improving service uptime and reducing error rates across distributed applications.
Implemented mutual TLS (mTLS) for service-to-service communication, ensuring secure, encrypted traffic within the service mesh and protecting sensitive data in transit.
Regular monitoring activities in Unix/Linux servers like Log verification, Server CPU usage, Memory check, Load check, Disk space verification, to ensure the application availability and performance by using Dynatrace and Zabbix.
We also Developed a Lambda-based solution to process large volumes of financial data. The function was triggered by data upload events to S3, efficiently aggregating and transforming this data for real-time financial reporting.
Applied best practices for security within the GitHub EMU framework, such as enforcing two-factor authentication, setting up required status checks before merging, and managing deploy keys for secure repository access.
Integrated a CI/CD pipeline (using Jenkins/AWS Code Pipeline) for automated testing and deployment of containerized applications to ECS.
Demonstrated expertise in building robust CI/CD pipelines using AWS Code Pipeline, Code Deploy, and Code Build, orchestrated through AWS CloudFormation. This streamlined deployment processes, ensuring consistent and reliable application updates.
Wrote and optimized complex SQL queries and stored procedures to improve database performance and support application requirements.
Integrated Jenkins with Nexus, SonarQube, Ansible and used CI/CD within Jenkins on Docker container environment, utilizing Docker for the runtime environment for the CI/CD system to build, test and deploy in AWS and private cloud (on-premises).
implemented AWS Lambda functions to automate compliance checks on transaction records in DynamoDB, ensuring continuous adherence to financial regulations. This automation significantly reduced the need for manual oversight, leading to substantial savings in both time and resources.
Administered and Engineered Jenkins for managing weekly Build, Test and Deploy chain as a CI/CD process, GIT with Development/Test/Prod Branching Model for weekly releases.
Designed and implemented a comprehensive infrastructure as code (IaC) strategy using tools like Terraform and Ansible, reducing manual setup processes by 50%.
Created Docker files for front-end applications built with AngularJS and back-end applications using Java, ensuring consistency across development, staging, and production environments.
Configured and maintained Jenkins to implement the CI process and integrated with Ant and Maven to schedule the builds, Used JIRA with Maven release plug in for defects and bug tracking. Used Nagios for monitoring web applications, web services, URL monitoring, content monitoring and HTTP status.
Managing multiple corporate applications in GitHub/Bitbucket code management repositories and creating & granting access for users related to GIT/Bitbucket project directories for the code changes.
Collaborated with cross-functional teams to resolve issues related to networking, service communication, and service discovery within the Istio service mesh.
Supporting 24x7 production on-call and weekend support computing environments.
DevOps Engineer
UHG, USA, INDIA Mar 2019 to Sep 2020
Responsibilities:
Experience in Software Integration, Configuration, building, automating, managing and releasing code from one environment to another environment and deploying to servers.
Worked in AWS environment, instrumental in utilizing Compute Services (EC2, ELB), Storage Services (S3, Glacier, Block Storage, Lifecycle Management policies), Cloud Formation, Lambda, VPC, RDS and Cloud Watch.
Managed and maintained DB2, SQL Server, and Oracle databases, including backup, recovery, and performance tuning.
Migrated Linux environment to AWS by creating and executing a migration plan, deployed EC2 instances in VPC, configured security groups & NACL's, attached profiles and roles using AWS Cloud Formation templates.
Used Amazon Route53 to manage DNS zones globally & to give public DNS names to ELB's and Cloud Front for Content Delivery
Experience in implementing AWS lambda to run servers without managing them and to trigger run code by S3 and SNS.
Implemented monitoring and alerting solutions using Prometheus and Grafana, improving incident response time by 25%.
Spearheaded the end-to-end migration of enterprise-scale infrastructure from Azure DevOps to GitHub Enterprise, ensuring a seamless transition with zero downtime.
Established a secure and reliable network connection between AWS and GCP using interconnects and VPNs. This facilitated smooth data transfer and communication between services hosted on both cloud platforms.
Executed data migration projects, ensuring data integrity and minimal downtime during transitions.
Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins along with PowerShell to automate routine jobs.
Developed shell scripts for automation of the build and release process, developed Custom Scripts to monitor repositories, Server storage.
Orchestrated a Kubernetes environment that spans across AWS and GCP, ensuring high availability and scalability. Utilized AWS EKS and Google Kubernetes Engine for a unified application deployment strategy.
Utilized Grafana and Prometheus for detailed observability in Kubernetes environments, leading to faster issue resolution and system optimization.
Created Ansible playbooks to automatically install packages from a repository, to change the configuration of remotely configured machines and to deploy new builds and various automation purposes, file copy, permission changes, configuration changes, path specific folder creation.
Used Ticketing tool JIRA to track defects and changes for change management, monitoring tools like New Relic and CloudWatch in different work environments in real and container workspace.
Design enterprise patterns that are repeatable and consistent with regards to deployment and configuration of their respective systems. Patterns must encompass pre-production testing, performance tuning, technical hand-off documentation and environment validation.
DevOps Engineer
CSI Sdn. Bhd. Malaysia (Kuala Lumpur) Dec 2017 to March 2019
Responsibilities:
Deployed AWS Solutions using EC2, S3, and EBS, Elastic Load balancer (ELB), auto-scaling groups and OpsWorks.
Planed, deployed, monitored, and maintained Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare virtual machines as required in the environment.
Performed S3 buckets creation, policies on the IAM role based polices and assigned to cloud instances.
Created Python scripts to totally automate AWS services which include web servers, ELB, Cloud Front distribution, database, EC2 and database security groups and application configuration, this script creates stacks, single servers, or joins web servers to stacks.
Implemented a disaster recovery solution utilizing AWS as the primary cloud and GCP as the secondary. This setup ensured business continuity by enabling rapid failover and data recovery in case of an AWS outage.
set up custom alerts and reports in SiteScope to ensure timely detection and resolution of issues.
Conducted training sessions for staff on HIPAA compliance and best practices for data protection.
Implementing a Continuous Delivery framework using Jenkins, Maven in Linux environment. Created virtual environments via vagrant with chef client provision.
Designed CI/CD Pipelines to make use of Docker files and Make files for building docker images and validating containers using entry points.
Monitored a large-scale deployment of web servers and databases using SiteScope, setting up alerts that helped proactively manage and prevent potential downtime.
Protected health information (PHI) by enforcing HIPAA security and privacy rules.
Wrote Ansible Playbooks with Python SSH as the Wrapper to Manage Configurations of AWS Nodes and Test Playbooks on AWS instances using Python.
Used SiteScope data to perform capacity planning and optimize resource allocation.
I wrote the ansible playbooks which is the entry point for Ansible provisioning, where the automation is defined through tasks using YAML format. Run Ansible Scripts to provision Dev servers.
Worked on GIT Bucket which includes creating new users, Branching, Merging the changes and writing some pre-commit & post-commit hook scripts.
Involved in setting up JIRA as defect tracking system and configured various workflows, customizations, and plugins for the JIRA bug/issue tracker.
Worked on setting up Splunk to capture and analyze data from various layers of Load Balancers, Webservers.
Installed and configured DHCP, DNS, web servers like Apache, IIS, mail servers like SMTP, IMAP, POP3, and file.
Application Support Engineer
HSBC Sdn. Bhd. Malaysia (Kuala Lumpur) July 2014 to Dec 2017
Responsibilities:
Releasing code to test regions or staging areas according to the schedule published.
Involving in development both service and presentation layer by using the technologies
Involved in analyzing issues on HSBC modules and helping in providing solutions to client development and L2/L3 teams.
Following agile methodology. Issues fixing while finding QA testing.
Involving code deployment process and running the PCF pipeline.
Dealing with finance users to understand their requirements and implement in the batch.
Integrate project with MQ server to receive massage from Mainframe server to Linux servers and use those massage to create BPM Dashboard.
Implement SSL certificates to Essbase and MQ servers.
Migrate control-m jobs from Control-m v7 to v9 while the process of upgrade.
Integrate the control-m xml logs to get the data and feed to BPM dashboard.
Creating quantitative and control recourse to balance workload on servers.
Raise the PMR to IBM to solve any of the cogon’s and MQ issues. And work with them very closely to solve issues with the timeline.
Running the complete UAT batch cycle (END to END) without much dependency.
PWC user quarterly report request process setup and automated.
Creating build monitoring dashboards and alerts on Splunk for different modules.
Liaising with team members to achieve management daily, weekly, and monthly goals.
Fixed issues with cross-site vulnerabilities for legacy modules.
Supervising system modifications and updating of legacy system to the latest technology system wherever required.
Application Support Engineer
Musoft Solutions Sdn. Bhd. Malaysia (Kuala Lumpur) Aug 2013 to July 2014
Responsibilities:
Monitor batch jobs, systems, and applications using tools like Control-M, Splunk, ELK Stack, or other log aggregation tools.
Identify and respond to alerts related to job failures, performance issues, or system incidents. Attempt basic remediation actions (restart jobs, clear logs, etc.).
Log incidents in systems like ServiceNow, maintaining proper documentation of the issue, resolution steps, and resolution time.
Inform stakeholders of job failures or critical incidents and escalate as needed to L2/L3 support teams.
Monitor batch jobs scheduled via Control-M and ensure they run as expected.
Perform standard recovery procedures (restarting jobs, re-running failed jobs).
Analyze logs and identify the reason behind failures, escalate for root cause analysis if necessary.
Use Splunk monitoring tools to identify patterns or issues in logs.
Look for early signs of issues and escalate if needed.
Monitor database health and performance.
Execute basic SQL queries for validation, data retrieval, and troubleshooting.
Releasing code to test regions or staging areas according to the schedule published.
Assist in the execution of UAT tests, ensuring batch jobs and systems meet functional requirements.
Log defects in systems like HP ALM and escalate issues to L2 support when necessary.