Nirosha Ch.
+1-980-***-**** ****************@*****.*** www.linkedin.com/in/nirosha-ch-devops
SUMMARY
Sr. DevOps Engineer with 10+ years of experience having worked in various fortune 500 organizations such as Retail, Manufacturing, insurance, Banking etc.
Specialized in Azure & AWS cloud, CI/CD automation, and infrastructure management.
Experienced in scripting (Shell, Perl, PowerShell), managing Linux and Windows systems, and automating deployments using Terraform, Docker, Kubernetes, Azure DevOps, Jenkins and GitHub Action.
Experience in Azure Cloud Services (PaaS, IaaS & SaaS), Storage, Web Apps, Active Directory, Azure Virtual Network Manager, Traffic Manager, Azure Monitoring, OMS, Key Vault, Cognitive Services (LUIS).
Experience in installing, configuring, and administering the Jenkins CI tool, using Jenkins pipelines for microservice builds and deployment to Docker Registry and Kubernetes.
Experienced in AWS Cloud Services including S3, EC2, EBS, VPC, RDS, SES, ELB, Auto Scaling, CloudFront, CloudFormation, Elastic Cache, API Gateway, Route 53, Glacier, & CloudWatch.
Worked on Kubernetes for deploying, scaling, and managing Docker containers, and experience with managed Kubernetes services like AKS, EKS and GKE.
Excellent communication skills, team participation, inter-team co-ordination, and team leadership abilities and working closely with cross geography teams.
TECHNICAL SKILLS:
Skills Details
Cloud Platforms Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP). CI/CD Tools Azure DevOps, Jenkins and GitHub Action Infrastructure as Code
(IaC)
Terraform, ARM Templates
Application/Web
Servers
Apache Tomcat, Nginx, WebSphere, JBoss, WebLogic.
Containerization Docker, Kubernetes, Azure Container Registry (ACR), OpenShift, ECS, ACS. Scripting Languages Bash, Shell, Perl, PowerShell and Python, Groovy. Monitoring & Logging
Tools
Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Dynatrace, Prometheus Automation Tools Ansible, Terraform, CloudFormation. Code Quality &
Security
SonarQube, Quay (Container Scanning), Allure Reports Version Control
Systems
Git (GitHub, Azure Repos)
Artifact Management Azure Artifact and JFrog, Nexus Repository Manager Operating Systems Linux (Ubuntu, RHEL), Windows Servers Ticketing and Project
Tracking Tools
Azure Boards, ServiceNow, MS-SharePoint, MS Visio, MS Office, HP ALM, JIRA, Confluence
Networking/Protocol NIS, NFS, DNS, DHCP, WAN, SMTP, LAN, FTP/TFTP. EXPERIENCE
Clients:
Hershey's Oct2022 to Present
Hershey's, is an American multinational confectionery company headquartered in Hershey, Pennsylvania, The Hershey Company is one of the largest chocolate manufacturers in the world. Role: Sr.Azure DevOps Engineer
Environment: Azure DevOps,Jenkins, Azure Cloud,AKS, ACR, Azure App Services,Azure Boards, Dynatrace,AKS,ARM Templates,Terraform, Autosys,Grafana,ELK,Prometheus,SonarQube, Jira,ServiceNow Key Contributions:
Designed and implemented end-to-end Azure DevOps strategies, establishing robust CI/CD pipelines using Azure DevOps and Jenkins to streamline build, deployment, and automation workflows.
Managed cloud infrastructure using Azure Services such as Azure VNET, VPN Gateways, ExpressRoute, App Services, and Azure Kubernetes Services (AKS), ensuring high availability, scalability, and security.
Implemented and automated deployments into Azure Kubernetes Services (AKS) clusters, scripting deployment processes using Bash and PowerShell for Java, NodeJS, Quarkus, and React-based applications.
Improved visibility and performance analysis of the CI/CD pipeline by integrating advanced monitoring and logging tools
(Grafana, ELK Stack, Prometheus) and code quality tools (SonarQube, Quay scanning, Allure reporting).
Automated infrastructure provisioning and configuration management using Terraform and Azure CLI, significantly reducing deployment cycles and infrastructure setup times.
Actively involved in Agile project management practices using Jira and Azure Boards, conducting daily scrums, sprint planning, and task management to ensure smooth collaboration across teams.
Performed migration of tracking tools from Task-top Jira to Azure Boards, conducting Proof of Concepts (POCs) using automation dependency tools (Dependabot, Renovate) and decorators to enhance workflow efficiency.
Maintained technical documentation and KPIs tracking, providing visibility to stakeholders on deployment metrics, system health, and team productivity.
Migrated legacy shell-scripted cron jobs to Autosys, improving job monitoring, alerting, and centralized management.
Integrated Autosys job runs into CI/CD pipeline workflows to ensure smooth deployment-to-scheduling transitions.
Maintained and scheduled Autosys jobs that supported cloud-hosted services (deployed via AKS and App Services).
Collaborated with Azure DevOps teams to align job scheduling windows with CI/CD deployment schedules for minimal downtime.
Troubleshot batch job failures post-deployment using job logs, autorep, and environment diagnostics.
Implemented end-to-end monitoring solutions for microservices using Dynatrace, configuring detailed alerting, metrics, synthetic monitoring, and dashboards to proactively detect and resolve performance bottlenecks.
Developed and maintained Azure DevOps and Github Action CI/CD pipelines, automating deployments of containerized microservices into Azure Kubernetes Service (AKS) clusters.
Managed container registry operations via Azure Container Registry (ACR) for streamlined container image storage, tagging, versioning, and lifecycle management.
Deployed web applications and microservices securely onto Azure App Services, handling scaling, environment-specific configurations, and SSL certifications.
Collaborated closely with development teams to integrate application monitoring within CI/CD workflows, reducing troubleshooting time and improving overall system reliability.
Participated in daily Agile scrums, providing insights and status updates on monitoring effectiveness, deployment pipelines, and infrastructure health.
Managed daily Azure cloud infrastructure operations, handling BAU support tasks such as provisioning, scaling, and troubleshooting infrastructure services like Azure Virtual Machines, Azure Storage, Azure Backup, and Azure Virtual Networks.
Extensively utilized Azure Resource Manager (ARM) templates to provision and configure Azure infrastructure resources, enabling quick, repeatable, and standardized deployments across various environments.
Managed incidents, change requests, and problem tickets using ServiceNow, ensuring timely resolution of infrastructure- related issues within defined SLA targets.
Coordinated regular OS patching, vulnerability remediation, and compliance checks on Azure infrastructure, adhering to organizational security policies.
Collaborated proactively with cross-functional teams (Application, Security, Networking) for infrastructure enhancements, security audits, and disaster recovery planning, ensuring minimal downtime and maximum availability.
Utilized Ansible to automate infrastructure provisioning, configuration management, and application deployments, reducing manual intervention and ensuring consistency across Linux environments.
Managed configuration drift and ensured compliance across servers using Ansible playbooks, enhancing infrastructure stability and reliability.
Documented comprehensive infrastructure processes, SOPs, and guidelines for efficient infrastructure management and seamless knowledge transfer within the team.
Toro April2020 to Oct2022
The Toro Company is an American company based in the Minneapolis, Minnesota that designs, manufactures, and markets lawn mowers, snow blowers, and irrigation system supplies for commercial and residential, agricultural, and public sector uses. Role: Site Reliability Engineering
Environment: AWS (EC2, S3, Lambda, RDS, EKS, CloudFormation), Kubernetes, Docker, Terraform, Jenkins, GitHub, CloudWatch, Prometheus, Grafana, ELK Stack, Ansible, Python, Bash, Linux. Key Contributions:
Designed and deployed highly available applications using AWS services such as EC2, Route53, S3, RDS, SNS, SQS, and DynamoDB with a focus on auto-scaling, failover, and fault tolerance.
Managed and troubleshooted AWS cloud resources, including EC2 instances, S3 buckets, VPC configurations, and Elastic Load Balancers (ELBs) to ensure consistent uptime and performance.
Automated infrastructure provisioning with Terraform, enabling repeatable deployments and scalability for multi-tier applications hosted in the AWS
Provisioned and managed scalable cloud infrastructure in AWS using Terraform and CloudFormation, supporting high- availability production systems.
Deployed and maintained Kubernetes clusters (EKS) for container orchestration, implementing pod auto-scaling, network policies, and Helm-based deployments.
Built robust CI/CD pipelines using Jenkins and GitHub Actions to automate build, test, and deployment workflows across multiple environments.
Improved application observability by configuring Prometheus, Grafana, and CloudWatch for detailed metrics, alerting, and performance dashboards.
Automated operational tasks using Ansible, Python, and Bash scripts, reducing manual intervention in system management and application deployment.
Used JFrog Artifactory to store and maintain the artifacts in the binary repositories and push new artifacts by configuring the Jenkins project using Jenkins Artifactory Plugin.
Implemented centralized logging using the ELK stack (Elasticsearch, Logstash, Kibana), enabling faster root cause analysis across distributed microservices.
Managed secrets and secure configuration through HashiCorp Vault and AWS IAM, enforcing best practices in secure DevOps pipelines.
Supported migration of monolithic apps to containerized microservices using Docker and Kubernetes, enhancing deployment speed and scalability.
Collaborated with developers and QA teams to define SLOs and SLIs, aligning monitoring and alerting with business performance goals.
Conducted infrastructure performance tuning and cost optimization across EC2, RDS, and Lambda services through detailed usage and billing analysis.
Reduced deployment failures by 70% by standardizing CI/CD pipelines using Jenkins and GitHub Actions, with integrated testing and rollback capabilities.
Accelerated infrastructure provisioning time by 60% by migrating manual processes to automated Terraform-based Infrastructure as Code (IaC) workflows.
Improved system observability by implementing a centralized monitoring and logging stack using Prometheus, Grafana, CloudWatch, and ELK, enabling faster incident detection and resolute
Enhanced Kubernetes reliability and efficiency by optimizing resource usage, configuring auto-scaling, and implementing Helm-based deployment strategies.
Decreased manual toil by scripting 20+ operational automations in Python and Bash, including log collection, service restarts, and health checks.
Strengthened cloud security posture by automating IAM policy enforcement, integrating secrets management with Vault, and conducting periodic access audits.
Contributed to successful migration of legacy systems into containerized Kubernetes workloads, reducing deployment time from hours to minutes.
Supported compliance and governance by building audit-ready infrastructure templates, tagging standards, and monitoring dashboards aligned with internal policies.
Mercury General Insurance Feb2018 to April2020
Mercury General Corporation is a multiple-line insurance organization that offers personal automobiles, homeowners, renters, and business insurance. Mercury's primary focus is automobiles and homeowners' insurance. Role: AWS DevOps Engineer
Environment: Team: AWS Infrastructure Support, AWS Cloud. Key Contributions:
Deployed AWS Solutions using EC2, S3, and EBS, Elastic Load Balancer (ELB), auto-scaling groups, and Ops works.
Planned, deployed, monitored, and maintained Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare virtual machines as required in the environment.
Performed S3 buckets creation, policies on the IAM role-based policies, and assigned to cloud instances.
Created Python scripts to automate AWS services which include web servers, ELB, Cloud Front distribution, database, EC2, and database security groups and application configuration, this script creates stacks, single servers, or joins web servers to stacks.
Integrated CDK deployments seamlessly into CI/CD pipelines, enabling automated testing, validation, and deployment of infrastructure changes alongside application code.
Configured auto-scaling and load balancing features of AWS Elastic Beanstalk to dynamically scale applications based on traffic demands, ensuring optimal performance and availability.
Implemented Continuous Delivery framework using Jenkins, CHEF, and Maven in a Linux environment. Created virtual environments via Vagrant with chef-client provision.
Designed CI/CD pipelines to make use of Docker files and for building Docker images and validating containers using entry points.
Worked on implementation of Elastic Stack (ELK) solutions, including Elasticsearch, Logstash, and Kibana, to centralize and analyze log data effectively.
Implemented continuous integration using Jenkins. Configured security to Jenkins and added multiple slaves for continuous deployments.
Implemented Ansible to manage all existing servers and automate the build/configuration of new servers.
Built Jenkins pipelines to drive all microservices builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes.
Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy. Created Jenkins jobs to deploy applications to Kubernetes Cluster.
Hands on experience in using ELK (Elastic Search, Kibana, Log stash), Splunk, Nagios to get data for each application about usage.
Automated the deployment process by writing Perl, Python scripts in Jenkins.
Extensive experience in Centos / RHEL/Unix system Administration, System Builds, Server Builds, Installations, Upgrades, Patches, Migration, Trouble shooting on RHEL 4.x/5.x, Centos, Troubleshooting Server issues.
Working on User requests via ticketing system (JIRA) related to system access, logon issues, home directory quota, file system repairs, directory permissions, disk failures, hardware and software related issues. Bharti Airtel August2015 to Dec2017
Bharti Airtel Limited is an Indian multinational telecommunications company based in New Delhi. It operates in 18 countries across South Asia and Africa, as well as the Channel Islands. Currently, Airtel provides 5G, 4G and LTE Advanced services throughout India. Role: Build & Release Engineer
Project: IOT-CMP
Environnement : Linux,DevOps,Sonarqube,Servicenow,Jira,Confluence,weblogic,RabbitMq,Cassandra,ELK. Key Contributions:
Involved in installing Jenkins on a Linux machine and created a Master and Slave configuration to implement multiple parallel builds through a build.
Automated various infrastructure activities like Continuous Integration & Continuous Deployment, Application Server setup.
Created a new regression Environment used by the application team from the scratch – Have setup Regression environment end to end which included - Apache setup, Redis setup, Cassandra setup, Kong setup (adding routes, upstreams, pluggins, services, targets), Weblogic deployed, RabbitMq (added rabitmq queues), Deployed all docker service in regression environment, Applied SSL certificate to the new regression environment.
Worked on ELK upgrade as part of Apache log4j vulnerability remediation project.
As part of ELK Disaster Recovery plan – designed Active Passive method of DR by taking the Data centre backup.
Worked on ELK (Elastic search Log Stash Kibana) DR – Disaster Recovery design plan for Airtel.
Worked on developing Docker images to support Development and Testing Teams and their pipeline distributed Jenkins.
Worked on Build & deploying of micro services into production using Blue- Green deployment strategy.
Worked on Creating Docker images for app isolation, reducing the time between provisioning and deployment from over 8 hours to less than 10 minutes.
Created jobs using Jenkins. Manage installations of node, deployment configuration, and administration, backup, installed required Jenkins plugins as per the project requirement.
Worked on Creating, Configuring, and Managing CICD Pipeline by Integrating GIT, Maven, SonarQube, and Nexus with Jenkins.
Integrated all docker services with SonarQube, configured Pluggins & dependencies in Pom.xml to give code coverage feature for development team.
Worked on Comparing codes in config-test.yml, config-prod yml in Bitbucket and raised PR to merge the code for Prodcution deployment of Micro services.
Performed continuous Build and Deployments to multiple environments like Dev, QA.
Worked on Setting up Amazon EC2 instances, VPCs, and Security groups.
Worked on Setting up databases in AWS using RDS, storage using S3 buckets and configuring instance backups to S3 bucket.
Worked on Monitoring all application servers and services by using Grafana.
Involved in installing Jenkins on a Linux machine and created a Master and Slave configuration to implement multiple parallel builds through a build.
SpanIdea Systems Pvt Ltd Bengaluru, India. March2014 to July2015 Role: Linux System Administrator
Project: Surveillance B2C
Environment: Linux, Bash Shell
Key Contributions:
Automated system configuration and task execution using Bash and Shell scripting, streamlining daily administrative tasks and reducing manual intervention.
Configured, managed, and optimized Apache Tomcat servers for application deployment, ensuring efficient performance and uptime.
Proficient in writing and optimizing Bash shell scripts to automate system administration tasks such as backups, log rotation, user management, and system health checks, increasing efficiency and reducing manual intervention.
Experienced in using Bash scripting to troubleshoot and resolve system issues, including process monitoring, disk space management, and network troubleshooting, leveraging commands like grep, awk, and sed to extract and analyze log files.
Monitored system health using tools like Nagios, identifying potential issues and proactively addressing performance bottlenecks.
Provided 24x7 on-call support for Linux-based systems, troubleshooting and resolving critical incidents to ensure minimal downtime.
Collaborated with developers to deploy and maintain applications, optimizing performance through system configurations and application tuning.
Assisted in the migration of legacy applications and systems, ensuring smooth transitions with minimal disruption to operations.
Experienced in managing and optimizing server processes and resources, utilizing top, htop, and vmstat to troubleshoot high CPU and memory usage, and implementing solutions to enhance system efficiency.
Experienced in diagnosing and resolving system performance issues by analyzing logs and using tools like dmesg and journalctl to identify and fix bottlenecks, ensuring optimal system performance.
Skilled in troubleshooting network connectivity problems, utilizing ping, ifconfig, netstat, and traceroute to identify and resolve issues related to DNS, routing, and firewall configurations.
Proficient in resolving disk space and file system issues by monitoring disk usage with df and du, and running filesystem checks (fsck) to fix corrupted file systems and avoid potential system crashes. EDUCATION & CERTFICATIONS
Jawaharlal Nehru Technological University, Hyderabad, India, B. Tech in Electrical & Electronics Engineering.
Red hat certified engineer(2019)