Post Job Free
Sign in

Devops Engineer Team Lead

Location:
Linthicum Heights, MD
Posted:
April 30, 2025

Contact this candidate

Resume:

DevOps Engineer

NANDANARGUNAN (“Nandha”)

linkedin.com/in/nandanargunan

+1-609-***-****

**********@*******.***

Professional Summary

Results-driven DevOps Engineer and Technical Team Lead with over 11 years of experience architecting, automating, and optimizing cloud and on-premises infrastructure across AWS, GCP, Azure, and hybrid environments. Successfully led a team of six engineers, fostering collaboration and driving operational excellence through mentorship and Agile practices. Spearheaded the automation of deployment processes using CI/CD pipelines with Jenkins, AWS CodePipeline, and Git, reducing deployment times by up to 35% and enhancing system reliability. Proficient in infrastructure-as-code with Terraform and Ansible, container orchestration with Docker and Kubernetes, and observability with OpenSearch, DocumentDB, NewRelic, and Grafana. Worked closely with development teams to streamline deployments, ensuring seamless integration and zero-downtime releases. Expert in managing Linux and Windows Server environments, virtualization with Hyper-V and VMware, and database administration for MongoDB and ElasticSearch. Demonstrated leadership in automating complex workflows using Python and bash scripts, achieving significant cost savings and operational efficiency. Committed to delivering scalable, secure, and high-availability solutions while maintaining comprehensive documentation in Jira and Confluence for operational continuity.

Education

Master of Computer Science, 2015 – Pondicherry University, India

Bachelor of Computer Science, 2010 – Thiruvalluvar University, India

Professional Certifications: Microsoft Certifications Transcript

Microsoft Azure Security Technologies

Microsoft Azure Administrator

Designing Microsoft Azure Infrastructure Solutions

Managing Office 365 Identities and Requirements

Professional Skills

Cloud Platforms

AWS, GCP, Azure

Database Admin

MongoDB, ElasticSearch, SQL, OpenSearch, DocumentDB

Monitoring/Logging

CloudWatch, Grafana, Prometheus, NewRelic, Nagios, SolarWinds

Container Management

Kubernetes, Minikube, Docker, Portainer, Rancher

Infrastructure Mgmt

Ansible, Terraform

Version Control

Git, Azure Repos, Bitbucket, AWS CodeCommit

CI/CD Tools

Jenkins, Azure DevOps Pipelines, GitLab Pipelines, AWS CodePipeline

Virtualization

Hyper-V, VMware (vSphere, vCenter, ESXi)

Operating Systems

Windows Server (AD, DNS, DHCP, WSUS), Linux (Ubuntu, CentOS)

Backlog Management

Jira, Azure Boards, Confluence

Networking

Firewalls (FortiGate, Sophos), Switches, VLANs, VPN (OpenVPN, Pritunl)

Backup Solutions

Veritas NetBackup, AWS S3 Lifecycle Policies, Azure Backup Vault

Email Servers

Postfix, Dovecot, Microsoft Exchange Online

Scripting

Python, Bash, PowerShell

Microsoft 365

Conditional Access, DLP, Azure AD, Exchange Online

Professional Experience

Cloud Engineer April 2024 – April 2025

MD-Think (State of Maryland)

Architected infrastructure using Terraform, provisioning AWS resources (VPCs, subnets, IAM roles) across multi-account environments, reducing setup time by 70% with reusable modules.

Orchestrated Jenkins pipelines with dynamic stages, integrating Docker builds and deployments to AWS ECS, enabling zero-downtime releases for 20+ microservices.

Containerized critical applications with Docker, optimizing image layers and implementing Dockerfile best practices to reduce build times by 30%.

Managed Git and BitBucket repositories, enforcing branch protection rules and automating merge conflict resolution with webhooks, improving code quality by 25%.

Streamlined project tracking with Jira and Confluence, creating automated workflows and knowledge bases to reduce incident resolution time by 20%.

Optimized OpenSearch clusters, configuring fine-grained access control and index sharding to handle 10M+ daily queries with sub-second latency.

Administered DocumentDB clusters, implementing TTL indexes and backup retention policies, ensuring compliance with state data retention requirements and 99.99% uptime.

Deployed NewRelic observability dashboards via Terraform, automating metric collection for 100+ services, and integrated Grafana for real-time infrastructure visualization, cutting issue detection time by 35%.

Engineered CI/CD pipelines using Jenkins and AWS CodePipeline, incorporating automated testing and canary deployments, achieving a 95% deployment success rate.

Automated resource cleanup and compliance checks using Python scripts triggered by Jenkins, identifying and terminating unused AWS resources, saving $10K monthly in cloud costs.

CloudOps Engineer February 2021 – March 2024

Constantin - NY, USA (Offshore)

Engineered Linux server optimizations by implementing cgroup resource limits and systemd service configurations, boosting application performance by 25% under high load.

Architected AWS and GCP hybrid infrastructure, leveraging AWS Direct Connect and GCP Interconnect to ensure low-latency data transfer across 10+ regions.

Developed Ansible playbooks for zero-downtime rolling updates of 200+ EC2 and Compute Engine instances, reducing deployment risks and manual intervention by 50%.

Administered OpenSearch clusters, implementing index lifecycle management with rollover policies to optimize storage costs and search performance by 30%.

Managed DocumentDB clusters in AWS, configuring multi-AZ replication and automated snapshots, achieving 99.999% availability for critical workloads.

Designed CloudWatch dashboards with custom metrics and Lambda triggers, automating incident remediation for CPU spikes, cutting resolution time by 40%.

Orchestrated Docker containers in production, using Docker Compose for multi-container applications and integrating with AWS ECS for scalable deployments.

Deployed Minikube for local Kubernetes testing, simulating production clusters to validate Helm charts before deploying to AWS EKS, reducing errors by 20%.

Managed Git workflows in AWS CodeCommit, implementing CI/CD hooks with Jenkins to automate testing and deployments, improving release cycles by 35%.

Led AWS to GCP migration for 50+ VMs and 10TB of data, using Velostrata for live migrations and Terraform for infrastructure recreation, completing with zero downtime.

Configured Jenkins pipelines with Groovy scripts, integrating with AWS CodePipeline for automated builds and deployments across multi-account environments.

Built AWS CodePipeline workflows, incorporating CodeBuild for containerized builds and CodeDeploy for blue-green deployments, ensuring 99.9% uptime.

Authored Terraform modules for reusable infrastructure, provisioning VPCs, subnets, and IAM policies across AWS and GCP, reducing setup time by 60%.

Automated Python and bash scripts for deployment workflows, including rolling back failed deployments and syncing configurations, saving 10 hours weekly.

Managed Microsoft 365 tenant configurations, implementing Conditional Access policies and DLP rules to secure 500+ user accounts, reducing phishing incidents by 15%.

Deployed Azure Virtual Networks and Application Gateways, configuring WAF rules and SSL termination to protect web applications, achieving 99.95% availability.

CloudOps Engineer January 2019 – January 2021

Techminds LLC, Chennai, India

Administered Linux servers (Ubuntu, CentOS), optimizing system performance through kernel tuning and automating routine maintenance tasks with bash scripts, ensuring 99.99% uptime for critical infrastructure.

Deployed and managed AWS EC2 instances and S3 buckets, implementing IAM roles and security groups to enforce least-privilege access, reducing security incidents by 30%.

Configured GCP Compute Engine and Cloud Storage, setting up VPC peering for secure inter-project communication, supporting multi-region application deployments.

Managed Azure Virtual Machines and Blob Storage, automating resource provisioning with Azure CLI scripts to streamline infrastructure scaling for 50+ workloads.

Utilized Azure Boards for backlog management, creating epics and user stories to track infrastructure projects, improving team collaboration and project delivery by 25%.

Maintained Azure Repos and AWS Code-Commit for infrastructure code, enforcing branch policies and pull request approvals to ensure high-quality deployments.

Automated infrastructure configuration using Ansible playbooks, managing 100+ servers for consistent patching and software updates, reducing manual effort by 40%.

Administered MongoDB and Elastic-Search clusters, optimizing shard allocation and indexing strategies to improve query performance by 20% for high-traffic applications.

Configured and maintained Postfix email servers on Linux, integrating with Dovecot and SPF/DKIM for secure email delivery, supporting 10,000+ daily emails with 99.9% deliverability.

Deployed Grafana and Prometheus for infrastructure monitoring, creating custom dashboards to track server health and network latency, enabling proactive issue resolution with 15% faster incident response.

Sr. SysOps Engineer August 2015 – November 2018

Integra Software Service, Pondicherry, India

Implemented Windows Server role-based access controls, configuring WSUS for automated patch management across 200+ servers, reducing vulnerabilities by 25%.

Optimized Linux server performance by tuning kernel parameters and managing LVM for dynamic storage allocation, supporting high-traffic applications.

Deployed Hyper-V replication for disaster recovery, ensuring real-time VM synchronization across sites with a 5-minute RPO.

Managed VMware vSphere environments, utilizing vMotion for live VM migrations to balance resource utilization without service interruptions.

Configured Git repositories in Bitbucket, enforcing pull request reviews and automated hooks for code quality checks, improving deployment reliability.

Customized Jira Service Management with automation rules to escalate critical tickets, reducing mean time to resolution by 20%.

Administered GoDaddy and Hostinger cPanel for hosting management, automating domain renewals, and configuring ModSecurity for web application firewalls.

Configured Cisco ASA firewalls with VPN tunnels (IPsec, SSL) to enable secure remote access for 100+ users, ensuring encrypted data transmission.

Optimized network switch performance by implementing QoS policies on Cisco Catalyst switches, prioritizing VoIP and critical application traffic.

Managed Veritas NetBackup for enterprise backups, implementing deduplication to reduce storage costs by 30% while maintaining daily snapshots.

Automated backup verification scripts to validate data integrity, ensuring a 100% restore success rate during quarterly DR drills.

Monitored network firewall traffic with SolarWinds, analyzing bandwidth usage and blocking malicious IPs to prevent DDoS attacks.

Created detailed runbooks in Confluence for firewall rule changes and backup recovery processes, enabling faster incident response and team onboarding.

Configured AWS EC2 instances for basic workloads, learning to deploy and manage virtual servers with guidance from senior team members.

Set up AWS S3 buckets for data storage, implementing basic lifecycle policies to archive data and reducing costs under supervision.

Server Admin April 2010 – June 2012

Lenovo, Pondicherry, India

Administered Windows Server environments, managing Active Directory, DNS, DHCP, and Group Policy configurations for secure and efficient operations.

Performed Linux server administration, including setup, maintenance, security hardening, and troubleshooting to ensure robust infrastructure.

Managed Hyper-V environments, creating and configuring virtual machines for optimized workloads.

Utilized VMware vSphere and ESXi for virtualization, executing P2V migrations to transition physical servers to virtual environments.

Conducted pre-migration checks and post-migration validations during P2V conversions to ensure data integrity and minimal downtime.

Monitored VM server performance using ManageEngine, setting up CPU, memory, and disk usage alerts to maintain uptime.

Configured website availability and performance monitoring with UptimeRobot, enabling rapid issue detection and resolution.

Used Jira as a ticketing portal to track and resolve incidents, collaborating with cross-functional teams to meet SLAs.

Documented system configurations and migration processes, maintaining comprehensive records for operational continuity.

Applied security patches and updates to Windows and Linux servers, ensuring compliance with organizational security policies.

Architecture Design

AWS Cloud Infrastructure: Designed a scalable mobile app infrastructure with EC2, S3, and load balancers, incorporating serverless cost management.

GCP Cloud Infrastructure: Built an e-commerce app infrastructure with Elastic-Search, serverless functions, and email gateways.

Azure Data Center: Designed a virtual desktop environment for 500 users with file shares and network segmentation.

Key Projects:

M365 and Exchange Cloud Migration: Migrated users to M365, implementing security, compliance, and MDM configurations.

AWS to GCP Migration: Live-migrated MongoDB and SQL databases, VMs, and storage buckets with zero downtime.

AWS Data Center Setup: Built a secure environment for mobile app development, including MongoDB clusters and Jenkins automation.

SharePoint Cloud Management: Managed 500 SharePoint sites, integrating with Azure AD for user permissions.

On-Prem to Azure Migration: Migrated Linux VMs and NAS storage, configuring Azure AD and file shares.

Business Central SaaS to On-Prem: Migrated database and application to Azure using peering connections.

MongoDB Live Migration: Migrated MongoDB from AWS to Azure with zero downtime using cluster replication.

Postfix Email Server Migration: Moved CentOS Postfix/Dovecot server from AWS to GCP, preserving user data.



Contact this candidate