Post Job Free
Sign in

Devops Engineer Cloud Infrastructure

Location:
Houston, TX
Posted:
May 20, 2025

Contact this candidate

Resume:

Name: Akhila M

Sr. Cloud/DevOps Engineer

: +1-571-***-****

Email: **********@*****.***

Linkedin: https://www.linkedin.com/in/akhila-m-b215b4226/

As a highly skilled Cloud/DevOps Engineer with extensive 8+ years of experience across multiple industries, I have successfully led and executed complex projects involving cloud infrastructure, CI/CD pipeline automation, and infrastructure management. My expertise spans AWS, Azure, and Kubernetes, leveraging tools like Jenkins, GitHub Actions, Terraform, and Ansible to drive automation and streamline deployments. I have a proven track record in provisioning and managing scalable, secure cloud environments, implementing Infrastructure as Code (IaC), and optimizing development workflows for enhanced efficiency and reliability. With a strong background in automation, monitoring, and containerization, I consistently deliver high-quality, on-time project outcomes.

Experience in deploying and managing cloud infrastructure on AWS and Azure, utilizing services like EC2, S3, VPC, and Azure Virtual Machines for scalable solutions.

Strong proficiency in Infrastructure as Code (IaC) using Terraform, Ansible, and CloudFormation for automating the provisioning and management of cloud resources.

Expertise in designing and implementing CI/CD pipelines using Jenkins, GitHub Actions, and Azure DevOps, ensuring seamless and automated deployments across environments.

Skilled in containerization with Docker and Kubernetes, managing containerized applications and orchestrating deployments with AWS EKS, Azure AKS, and Helm.

Experienced in managing security with IAM, RBAC, Azure Key Vault, and AWS Secrets Manager to ensure compliance and data protection across cloud environments.

Knowledgeable in monitoring and logging with tools like CloudWatch, Prometheus, and Splunk, providing insights into system performance and enabling proactive issue resolution.

Proficient in version control systems such as Git and SVN, ensuring efficient code management and collaboration across development teams.

Effective in database management, including working with RDS, DynamoDB, and SQL databases, ensuring data availability, backup, and recovery.

Strong collaboration and project management skills, utilizing JIRA and Confluence to manage tasks, track progress, and document processes across teams.

Technical Skills:

Cloud Platforms: AWS (EC2, S3, IAM, RDS, Route 53, VPC, Lambda, EKS, ECS, ElastiCache for Redis, CloudFormation), Azure (Virtual Machines, Azure Kubernetes Service, Azure Active Directory, Resource Manager, SQL Database, Key Vault, Managed Redis - familiar)

Containerization & Orchestration: Docker, Kubernetes (EKS, AKS), AWS ECS (familiar), Azure Kubernetes Service (AKS), Docker Compose, Helm Charts, AWS ECR

DevOps Tools: Jenkins, GitHub Actions, GitLab CI, Terraform, Ansible, Chef, Puppet, Helm, Argo CD, Argo Workflows

Containerization & Orchestration: Docker, Kubernetes (EKS, AKS), AWS ECS (familiar), Azure Kubernetes Service (AKS), Docker Compose, Helm Charts, AWS ECR, Kubeflow Pipelines

CI/CD Tools: Jenkins, Harness, GitHub Actions, Azure DevOps, SonarQube, Nexus, Maven, Gradle

Infrastructure as Code (IaC): Terraform, CloudFormation, ARM Templates, Ansible Playbooks

Version Control Systems: Git, GitHub, SVN, Bitbucket

Monitoring & Logging: AWS CloudWatch, Prometheus, Grafana, Nagios, Splunk, Datadog, Dynatrace

Scripting Languages: Python, Bash, Shell Scripting, GoLang, Node.js, PowerShell

Configuration Management: Ansible, Chef, Puppet

Security & Compliance: AWS IAM, AWS Security Hub, Azure Key Vault, SAML SSO, Role-Based Access Control (RBAC)

Databases: Amazon RDS (PostgreSQL, MySQL), Azure SQL Database, DynamoDB, Redis (including core management and modules like JSON, Timeseries)

Collaboration Tools: JIRA, Confluence, Slack, Microsoft Teams

Operating Systems: Linux (Red Hat, CentOS, Ubuntu), Windows Server

Networking & Load Balancing: AWS VPC, Azure Virtual Network, NGINX, AWS Load Balancer, Azure Load Balancer, Network Security Groups (NSGs)

Work Experience:

Sr. Cloud DevOps Engineer

Pluto TV (August 2024 – Tilldate)

Set up and configured Harness and GitHub Actions to automate deployments, enforce gated approvals, and streamline CI/CD workflows with auto rollbacks and canary releases.

Designed and optimized CI/CD pipelines using GitHub Actions, Terraform, GoLang, and Jenkins, cutting manual effort by 60% and accelerating release cycles.

Built scalable, fault-tolerant AWS infrastructure using Terraform (EC2, VPC, EKS, ALB/NLB), supporting containerized and serverless applications.

Developed and deployed serverless REST APIs using AWS Lambda + Flask, integrating with RDS, S3, and securing deployments via AWS Secrets Manager.

Led migration of internal .NET tools to AWS using Terraform and CI/CD pipelines, achieving compliance with STIG and NIST 800-53 standards in GovCloud.

Designed and maintained Concourse CI/CD pipelines with modular tasks, DAG-based architecture, and integrations with GitHub, S3, and Docker, enabling blue-green and canary deployments.

Migrated legacy Jenkins pipelines to Concourse, reducing maintenance overhead and improving traceability.

Managed containerized workloads with Docker, EKS, Helm, and Argo Workflows, also leveraging Kubeflow Pipelines for ML automation.

Automated infra provisioning and remediation using Python (Boto3) and Ansible playbooks, improving reliability and reducing manual intervention.

Hardened RHEL 7/8 Linux servers, applied STIG compliance, and used Red Hat Satellite for automated patching and provisioning.

Integrated Linux hosts with Windows AD (via SSSD/Kerberos) and managed offline deployments via YUM repos.

Implemented IRSA with OIDC federation for secure EKS access without hardcoded credentials.

Built real-time observability pipelines using Prometheus, Grafana, Splunk, DataDog, and Redis TimeSeries; reduced troubleshooting time by 40%.

Developed Kafka-based real-time data pipelines and managed monitoring dashboards.

Migrated infrastructure from CloudFormation to Terraform Enterprise, adopting IaC best practices and version control.

Hardened cloud environments using AWS WAF, Cloudflare, and addressed findings via AWS Security Hub.

Created performance dashboards and alerting workflows using CloudWatch, enabling faster incident response.

Provided 24/7 support for production systems with 99.9%+ uptime across AWS and Kubernetes platforms.

Technologies Used:

Cloud: AWS (EC2, VPC, EKS, IAM, S3, CloudWatch, Lambda, WAF, Secrets Manager)

Tools: Harness, GitHub Actions, Jenkins, Terraform Enterprise, Helm, Argo Workflows, Kubeflow Pipelines, Prometheus, Grafana, DataDog, Splunk, Kafka, Redis TimeSeries, Docker, SonarQube, Nexus, Cloudflare

Sr. Infrastructure/DevOps Engineer

Informatica (June 2022 – Dec2023)

Applied security best practices using Azure NSGs and Azure Firewall, ensuring robust data protection and compliance with organizational policies.

Designed and implemented secure Azure VNet architectures, integrating subnets, NSGs, and Load Balancers to support scalable and isolated private networks.

Architected and deployed Azure VMs integrated with Azure Cache for Redis; configured Redis replication for high availability and performance.

Automated the deployment of custom Python packages into Azure Databricks Spark clusters using Azure DevOps pipelines, streamlining PySpark-based data workflows and improving pipeline reliability.

Deployed serverless database solutions using Azure SQL Database and enhanced caching with Azure Cache for Redis to ensure efficient data access.

Managed user authentication and access across cloud applications using Azure Active Directory (AAD), enabling SSO and strengthening security.

Enforced Role-Based Access Control (RBAC) policies to maintain least-privilege access and improve governance.

Automated Azure resource provisioning using ARM templates and developed reusable template libraries for consistent and scalable deployments.

Configured geo-replication and failover groups in Azure SQL Database for high availability and disaster recovery.

Explored GoLang for scripting and automated Azure AKS deployments to enhance scalability and performance of containerized apps.

Configured Azure Monitor and alert rules to proactively detect issues and anomalies in performance.

Integrated Azure Key Vault for secure management of secrets, keys, and certificates.

Established secure connectivity between on-prem infrastructure and Azure cloud, supporting hybrid cloud models.

Deployed and managed containerized applications using Azure Kubernetes Service (AKS), automating scaling and resource management.

Provisioned and managed RHEL-based Azure VMs using Ansible for patching, software installs, and compliance.

Connected Linux VMs to Active Directory for centralized access control and user policy enforcement.

Automated routine sysadmin tasks using Bash scripting and cron jobs.

Maintained scalable and consistent configuration across environments with YAML-based Ansible playbooks.

Utilized Azure Container Registry (ACR) for secure image storage and integrated with Azure DevOps CI/CD pipelines.

Built CI/CD pipelines with Azure DevOps to automate deployments across VM scale sets behind Azure Load Balancers.

Used Docker to containerize applications and streamline deployments across development and production.

Automated software configuration and deployments using Ansible, reducing manual operations.

Managed sprint planning and tracking through JIRA, ensuring timely project delivery.

Documented infrastructure, workflows, and knowledge transfer materials in Confluence to support team collaboration.

Technologies Used:

Cloud: Azure (Virtual Machines, Network Security Groups, Azure Firewall, Active Directory, SQL Database, Monitor, Key Vault, Virtual Network, Azure DevOps, Load Balancers, DNS)

Tools: Docker, Azure Container Registry, Azure Kubernetes Service, Ansible, JIRA, Confluence

Cloud/ DevOps Engineer

Cardinal Health (August 2021-May 2022)

Managed SSL/TLS certificate lifecycle using AWS Certificate Manager and implemented S/MIME to secure enterprise email communication.

Designed and automated CI/CD pipelines using GitHub Actions, Terraform, and GoLang scripts to support blue-green deployments on AWS EKS.

Built lightweight GoLang CLI tools to manage Kubernetes workloads across AWS EKS and Azure AKS; evaluated compatibility with Google GKE.

Developed and deployed Python-based Flask microservices, automating provisioning with Boto3 and Shell scripts for consistency and scale.

Migrated monolithic applications to Kubernetes using Helm and Argo Workflows, integrating with Docker-based GitHub Actions pipelines.

Managed multi-cloud Kubernetes clusters (EKS, AKS), configuring secure networking with Route 53, private DNS, and RBAC enforcement.

Designed and provisioned multi-tier AWS infrastructure (EC2, S3, RDS, VPC) using Terraform, achieving HA and fault tolerance.

Integrated SonarQube, Pylint, and Pytest into CI pipelines for code quality, test automation, and early vulnerability detection.

Enhanced observability with Grafana, Splunk, Datadog, and Redis TimeSeries, enabling real-time alerting and monitoring in Kubernetes clusters.

Automated AMI snapshot creation, disaster recovery workflows, and compliance auditing with scheduled scripting.

Created GitHub Actions workflows for semantic versioning, auto-tagging, and JIRA ticket integration, improving traceability.

Implemented IAM roles for service accounts (IRSA) to securely bind EKS pods to AWS IAM, eliminating hardcoded credentials.

Integrated AWS Secrets Manager and Parameter Store into EKS applications for secure secret and config management.

Managed Terraform remote state using S3 and DynamoDB, ensuring consistent collaboration and safe state locking.

Participated in on-call rotations, resolved P1/P2 incidents, and developed automation runbooks to reduce MTTR.

Used CloudWatch Logs Insights and custom metrics to troubleshoot production issues and optimize system performance.

Led container security initiatives by integrating OWASP scanning and image vulnerability detection into CI/CD pipelines.

Conducted code reviews, contributed to DevOps standards documentation, and mentored team members on Terraform and GitHub Actions.

Technologies Used:

Cloud: AWS (EC2, IAM, Route 53, S3, RDS, DynamoDB, SNS, VPC)

Tools: GitHub Actions, Terraform, Jenkins, Helm, Docker, SonarQube, Splunk, Maven, JIRA, Nexus, AWS CLI, Bash, Python, Shell Scripting, Node.js, Java, SVN

OS: Red Hat Linux, Solaris, CentOS 7.x, Ubuntu

DevOps Engineer

Waymo LLC (Feb 2020- July 2021)

Designed and implemented CI/CD pipelines using GitHub Actions and Cloud Build, enabling fast, reliable deployment of containerized applications.

Provisioned GCP infrastructure using Terraform to automate deployment of core services: GKE, Cloud Storage, Cloud SQL, Pub/Sub, and VPCs.

Built and deployed Python-based APIs and internal tools using Cloud Functions, App Engine, and Flask, integrated with GCP’s identity-aware proxy for secure access.

Managed multi-zone GKE clusters, ensuring auto-scaling, pod disruption budgets, and node pool isolation for different workloads.

Developed Helm charts and reusable deployment templates for internal ML microservices.

Integrated Prometheus and Google Cloud Monitoring (Stackdriver) with Grafana dashboards to observe system behavior and trigger alerts based on latency and memory thresholds.

Used Pub/Sub to decouple event-driven pipelines, including simulation job queues and ML result processing.

Secured resources using IAM roles, service accounts, and Workload Identity Federation to eliminate key-based access.

Worked closely with SRE and ML teams to support production ML workflows and automated rollout of new models across GKE environments.

Developed custom Terraform modules to manage reusable infrastructure patterns for GKE, Cloud NAT, firewall rules, and IAM policies across environments.

Enabled blue-green and canary deployments on GKE using GitHub Actions, Argo Rollouts, and health probes, ensuring zero-downtime upgrades for internal AI tools.

Automated backup and disaster recovery workflows for Cloud SQL and PersistentVolumes on GKE using Cloud Scheduler and custom Bash scripts.

Implemented cost governance dashboards using BigQuery and Looker Studio, monitoring GKE and Cloud Run cost breakdowns by environment and service owner.

Integrated Google Secret Manager with CI/CD workflows for secure credential rotation and access management in GitHub Actions and Terraform pipelines.

Containerized simulation tools and telemetry processors using Docker, pushed artifacts to Artifact Registry, and orchestrated multi-stage pipelines in GKE.

Built centralized logging and tracing pipelines using Cloud Logging, Cloud Trace, and OpenTelemetry, helping SRE teams reduce MTTR by 45%.

Used Workload Identity Federation to enable secure service-to-service communication between GKE, Cloud Functions, and Pub/Sub without hardcoded keys.

Conducted GKE node pool tuning (preemptible nodes, GPU scheduling, taints/tolerations) to optimize compute usage during large-scale simulation jobs.

Collaborated with data scientists to create automated inference validation workflows triggered by Pub/Sub events, reducing feedback loop latency for ML releases.

Java/DevOps Engineer

Global Logic, (August 2017-Dec 2019)

Designing and implementing Jenkins pipelines to automate build, test, and deployment workflows for Java web applications, laying a strong foundation for later DevOps and cloud roles.

Developed and maintained frontend components using Angular and TypeScript, integrated with backend APIs built using Java Spring Boot and Python microservices.

Used Maven for dependency management and Gradle for complex build tasks, including automated testing and code analysis.

Managed source code with Git and SVN, implemented branching strategies, and integrated GitHub with Jenkins to trigger builds on commit.

Configured Nexus for storing versioned build artifacts (JARs, Docker images) and enabled consistent deployment workflows.

Used Postman and Fiddler for API testing and debugging, optimizing backend service interactions.

Automated environment provisioning and configuration using Chef, and created reproducible developer environments using Vagrant.

Integrated SonarQube with Jenkins to perform static code analysis, enforcing security and code quality standards.

Implemented Nagios to monitor CI/CD pipeline health, server status, and application availability with proactive alerting.

Managed RHEL 7/8 systems, ensuring patch compliance and system stability in production and dev environments.

Used Red Hat Satellite for lifecycle management, patch automation, and provisioning of Linux systems.

Created modular Ansible playbooks (YAML) for software deployment, security hardening, and patch management, integrated with Red Hat Satellite.

Provisioned and managed virtual machines using VMware vSphere and KVM, automating snapshot and image-based deployments.

Configured secure offline package installations using internal YUM and APT repositories in isolated environments.

Managed artifact repositories with Nexus and Artifactory, integrating into Jenkins for build artifact management.

Integrated Linux servers with Active Directory using SSSD, Kerberos, and LDAP for centralized authentication and access control.

Automated AD-based permissioning with Ansible, aligning with enterprise IAM policies across Linux hosts.

Conducted vulnerability scanning and implemented remediation using Ansible and SCAP content.

Automated release management with Jenkins, including version tagging, release note generation, and rollback mechanisms.

Transitioned from Java to DevOps, exploring GoLang for prototyping automation tools in CI/CD pipelines.

Implemented full test automation pipelines (unit, integration, security) triggered on every commit using Jenkins.

Automated infrastructure configuration using Chef scripts and ensured environment consistency with Vagrant.

Built robust rollback strategies in Jenkins to restore stable builds in case of deployment failures.

Technologies Used: Git, SVN, Jenkins, Maven, Gradle, Nexus, Chef, Vagrant, SonarQube, Nagios, Bash, Python.

Cloud/Infrastructure: AWS, Docker, Kubernetes, Terraform.

Education:

Master’s: MS in Information Systems Technology in Wilmington University, Delaware, USA.

Bachelor’s: Bachelor of Technology in Electronics Communication and Engineering in HITAM, Hyderabad, India.



Contact this candidate