DevOps & Cloud Platform Engineer - SRE - DevSecOps Expert

Location:

Dallas, TX

Salary:

Posted:

December 11, 2025

Contact this candidate

Resume:

Sai Kumar

DevOps Engineer Cloud Platform Engineer DevSecOps Engineer SRE

***.*****.*********@*****.*** +1-216-***-****

SUMMARY

•DevOps / DevSecOps Engineer with 9+ years of experience building secure, automated, and highly reliable cloud platforms across AWS, Azure, and GCP for finance, healthcare, and technology clients.

•Strong background in enterprise cloud networking including VPCs, VNets, private endpoints, peering, routing, DNS, load balancing, and secure connectivity across multi-account environments.

•Experience modernizing legacy systems through automation, containerization, infrastructure codification, and CI/CD transformation—reducing operational overhead and improving deployment velocity.

•Hands-on experience implementing GitOps workflows using ArgoCD and Helm with automated reconciliation and configuration drift detection across Kubernetes clusters.

•Skilled at automating operational tasks including patching, configuration updates, scaling workflows, system remediation, and compliance enforcement using Ansible, SSM, Azure Automation, and scripting.

•Proven success aligning cloud platforms with SOC2, PCI, HIPAA, and internal governance standards through IAM controls, security policies, guardrails, and automated compliance checks.

•Experience improving on-call readiness by tuning alerts, reducing noise, creating runbooks, building automated remediation steps, and strengthening incident response processes.

•Deep expertise integrating cloud-native observability systems such as Prometheus, Grafana, ELK/EFK, CloudWatch, Azure Monitor, and GCP Ops Suite to improve performance monitoring, diagnosis, and reliability.

•Skilled at cost and performance optimization across AWS, Azure, and GCP, including compute rightsizing, optimizing autoscaling behavior, tuning storage tiers, and implementing lifecycle policies.

•Strong foundation in SRE principles including SLIs, SLOs, SLAs, error budgets, capacity planning, release safety, rollout strategies, and resilience engineering.

•Known for building strong partnerships with developers, QA, SREs, platform teams, and security stakeholders to streamline delivery, resolve incidents faster, and improve engineering maturity.

•Experienced in creating technical documentation, environment standards, IaC guidelines, architectural references, and onboarding materials to accelerate DevOps adoption.

•Supported large-scale cloud and Kubernetes migrations by guiding workload readiness, designing secure architectures, validating deployment models, and improving operational patterns.

•Adept at improving developer experience by simplifying CI/CD workflows, optimizing pipelines, reducing manual steps, and enabling self-service infrastructure and deployment capabilities.

•Familiar with event-driven and distributed systems using Kafka, Pub/Sub, and SQS/SNS to support high-volume, reliable, and observable application architectures.

TECHNICAL SKILLS

Category Skills / Tools

Cloud Platforms AWS, Azure, GCP

CI/CD & Automation GitLab CI/CD, GitHub Actions, Jenkins, Bitbucket Pipelines, Azure DevOps

Infrastructure as Code Terraform, CloudFormation, ARM/Bicep, Terragrunt

Kubernetes & Containers EKS, AKS, GKE, Docker, Helm, Kustomize, ArgoCD, GitOps

Configuration Management Ansible, Chef, Puppet, Packer, Cloud-Init

Monitoring & Logging Prometheus, Grafana, ELK/EFK, CloudWatch, Datadog, Loki Security & DevSecOps Vault, AWS Secrets Manager, OPA, Checkov, TLS, mTLS, Zero Trust Scripting Python, Bash, Groovy, PowerShell

Systems & Networking Linux, TCP/IP, Routing, SSL, NAT

Collaboration Agile, SRE Practices, Troubleshooting, On-Call

CERTIFICATIONS

•AWS Certified Solutions Architect – Professional

•Certified Kubernetes Administrator (CKA)

•Microsoft Certified: Azure Administrator Associate (AZ-104)

EDUCATION

Trine University – MS in Information Studies 2022

AVNIT – BS in Computer Science 2016

PROFESSIONAL EXPERIENCE

Truist Bank – Charlotte, NC Sep 2024 – Present

Senior DevOps/DevSecOps Engineer

Project Summary:

Supporting cloud modernization and platform engineering efforts for internal banking applications hosted on AWS. My work focuses on improving the reliability and security of the cloud environment, building automated deployment workflows, and creating standardized infrastructure patterns so development teams can deliver software faster and with fewer operational issues. I collaborate with application, security, and platform teams to ensure the AWS environment follows consistent best practices and meets the organization’s compliance and performance requirements.

•Designed and maintained enterprise CI/CD pipelines using GitLab CI/CD. Added automated testing, container builds, environment promotion, and security validation to support consistent and reliable deployments.

•Built reusable Terraform modules for VPC networking, IAM, EKS, EC2, RDS, S3, and logging patterns. Helped teams provision infrastructure in a standardized and compliant way across several AWS accounts.

•Implemented structured infrastructure deployments using Terraform Cloud remote state, workspaces, and AWS Organizations. Ensured clear separation between development, staging, and production environments.

•Migrated internal applications to Amazon EKS by configuring node groups, autoscaling, RBAC rules, Ingress controllers, and Helm based deployments to improve resilience and runtime consistency.

•Automated patching, AMI building, and configuration updates using Ansible, Packer, and AWS Systems Manager. Reduced manual overhead and improved operational compliance.

•Integrated DevSecOps controls into GitLab pipelines through SAST, SCA, image scanning, SBOM generation, and policy checks using OPA and Checkov. Strengthened security posture and improved early detection of vulnerabilities.

•Designed centralized logging, metrics, and dashboards using CloudWatch, Prometheus, Grafana, and EFK. Improved incident response time and increased visibility into service performance.

•Containerized legacy applications using Docker. Improved resource efficiency, startup times, and deployment consistency across environments.

•Designed IAM roles, permission boundaries, access models, and audit controls aligned with internal security and governance guidelines.

•Implemented secure secret storage using AWS Secrets Manager, Parameter Store, and KMS encryption. Ensured proper rotation, restricted access, and audit readiness for sensitive data.

•Supported production EKS workloads by tuning resource allocations, adjusting autoscaling behavior, configuring PodDisruptionBudgets, and implementing readiness strategies to improve uptime.

•Automated governance workflows including tag compliance, drift detection, idle resource cleanup, and cost reporting. Improved operational hygiene and lowered infrastructure spend.

•Collaborated with security teams to implement AWS Config rules, Service Control Policies, IAM guardrails, and automated compliance checks aligned with SOC2 and PCI requirements.

•Developed operational runbooks, automated remediation scripts, and enhanced on-call playbooks. Reduced manual intervention and improved recovery time during incidents.

•Performed performance tuning across AWS components by optimizing EC2 sizing, refining load balancer configurations, improving caching patterns, and analyzing CloudWatch insights to eliminate bottlenecks.

•Supported GitOps style deployments for select applications using ArgoCD. Managed Helm charts, applied sync policies, and monitored configuration drift between clusters.

•Participated in architecture reviews and provided guidance on AWS design patterns, networking choices, IAM structures, and secure deployment approaches.

•Improved deployment reliability by refining branching strategies, increasing pipeline test coverage, and adding automated validation gates to reduce release related issues.

•Mentored junior engineers and developers on Terraform practices, GitLab CI workflows, Kubernetes fundamentals, and secure development practices.

•Contributed to SRE initiatives by defining SLIs and SLOs, tuning alert thresholds, improving monitoring signal quality, and strengthening overall service reliability.

Environment: AWS, GitLab CI/CD, Terraform, Kubernetes, EKS, Helm, Docker, ArgoCD, Ansible, Packer, AWS Systems Manager, CloudWatch, Prometheus, Grafana, EFK, IAM, Secrets Manager, Parameter Store, KMS, OPA, Checkov, Linux, Python, Bash

Kaiser Permanente – Oakland, CA Feb 2022 – Aug 2024

DevOps Engineer

Project Summary:

Worked as part of the cloud reliability engineering group responsible for the stability, performance, and security of Azure-hosted healthcare platforms. Focused on improving service reliability, strengthening observability, automating operational workflows, and ensuring critical applications met strict availability and compliance requirements. Partnered closely with development, security, and cloud platform teams to build systems that were resilient, scalable, and ready for production at healthcare scale.

•Improved service reliability by designing strong CI/CD workflows in Azure DevOps. Added automated tests, quality gates, rollout validations, and deployment checks to reduce release-related incidents.

•Built ARM and Bicep templates for VNets, NSGs, Application Gateways, Key Vault, private endpoints, and AKS components. Ensured provisioning was consistent, repeatable, and aligned with SRE reliability standards.

•Contributed to Azure Landing Zone foundations by configuring management groups, diagnostic settings, Azure Policies, RBAC patterns, and network baselines that supported multi-team cloud adoption.

•Deployed and maintained AKS clusters with a focus on resiliency. Managed node pools, autoscaling, network policies, RBAC, ingress controllers, and Helm deployments to ensure reliable workload performance.

•Implemented automated reliability checks inside Azure DevOps pipelines, including secure code analysis, dependency scanning, container checks, and drift detection that prevented broken or misconfigured releases.

•Centralized secret management using Azure Key Vault, integrating identity-driven access, managed identities,

and pipeline authentication to reduce credential exposure risks.

•Built deep observability using Azure Monitor, Log Analytics, Application Insights, and custom dashboards. Created actionable alerts, KQL queries, and reliability insights used for real-time troubleshooting.

•Led container modernization efforts by migrating legacy applications into Docker and AKS, improving availability through better resource allocation, predictable deployments, and controlled rollout strategies.

•Designed Azure VNet architectures with private connectivity, custom DNS, routing rules, and NSG protections. Ensured network paths supported high availability and proper isolation for sensitive clinical data.

•Automated infrastructure and operational processes using Terraform, Bicep, Bash, and PowerShell. Eliminated manual tasks around provisioning, configuration, and environment baselining.

•Collaborated with identity teams to implement secure access models using AAD, managed identities, service principals, and conditional access aligned with least-privilege policies.

•Reduced operational toil by automating patching, configuration updates, scaling routines, and common remediation actions using Azure Automation and DSC.

•Worked closely with InfoSec to enforce Azure Policies for encryption, diagnostic logging, resource compliance, and network restrictions to ensure HIPAA and internal security requirements were consistently met.

•Supported production AKS workloads, analyzing performance issues, tuning autoscaling triggers, optimizing resource requests, and troubleshooting failures impacting service availability.

•Improved deployment reliability by refining branching strategies, adding pre-deployment validations, strengthening test coverage, and enforcing controlled rollout processes.

•Participated in architecture reviews, advising on Azure reliability patterns including health probes, scaling strategies, network resiliency, identity models, and disaster recovery approaches.

•Implemented backup and recovery patterns for Azure SQL, Storage Accounts, and AKS persistent volumes. Ensured recovery point and recovery time objectives were met for critical workloads.

•Reduced cloud costs without affecting performance by rightsizing compute, optimizing storage tiers, tuning autoscaling parameters, and analyzing patterns via Azure Cost Management.

•Authored runbooks, incident response guides, SRE playbooks, and automated scripts that improved on-call readiness and accelerated mean time to recovery during high-severity incidents.

•Collaborated with engineering teams to define SLIs and SLOs, tune monitoring thresholds, reduce alert noise, and improve overall service health through proactive reliability improvements.

Environment: Azure, Azure DevOps, AKS, ARM, Bicep, Terraform, Azure Monitor, Log Analytics, Application Insights, Key Vault, Azure CNI, Helm, Docker, Azure Policies, AAD, VNets, NSGs, Private Endpoints, Azure Automation, DSC, KQL, Git, PowerShell, Python, Bash

ZeebraCross – Hyderabad, India Aug 2018 – Jul 2021

Cloud & DevOps Engineer

Project Summary:

Worked as part of the engineering team supporting cloud-hosted applications for enterprise clients on AWS. Focused on implementing DevOps practices, improving deployment workflows, building foundational automation, and supporting day-to-day operations across development, QA, and production environments. Collaborated with developers and QA teams to streamline releases and improve overall infrastructure reliability.

•Built and maintained CI/CD pipelines using Jenkins and GitLab CI/CD. Automated build, test, and deployment steps to reduce manual effort and improve release consistency across multiple application teams.

•Created and managed Terraform templates for EC2 instances, VPC components, IAM roles, S3 buckets, Security Groups, and CloudWatch alarms. Ensured infrastructure provisioning was repeatable and aligned with AWS best practices.

•Assisted with setting up AWS environments including VPC subnets, routing, NAT configurations, EC2 provisioning, IAM policy creation, and S3 lifecycle rules to support application and data needs.

•Containerized application components using Docker and built optimized images to support consistent deployments across dev, QA, and staging environments.

•Supported application deployments by troubleshooting pipeline failures, dependency issues, environment mismatches, and configuration errors during release cycles.

•Implemented basic monitoring and alerting using CloudWatch metrics, logs, alarms, and dashboards. Helped teams investigate performance bottlenecks and availability-related issues.

•Automated routine operations using Bash and Python scripts including cleanup tasks, system checks, log rotations, and environment setup activities.

•Assisted in migrating legacy applications to AWS by provisioning compute, configuring security layers, validating storage requirements, and supporting functional validation in cloud environments.

•Worked with configuration management using Ansible, handling package installations, system updates, application configurations, and server preparation for new environments.

•Helped set up Nginx and Apache based web environments including reverse proxy rules, SSL configuration, and service performance tuning.

•Managed IAM permissions, user access roles, and environment-specific policies to ensure secure and controlled access to AWS resources.

•Supported RDS instances by assisting with parameter configuration, connection settings, backup validation, and performance checks under the guidance of senior engineers.

•Provided operational support for infrastructure incidents, deployment issues, monitoring alerts, and system outages. Worked with senior engineers on root cause analysis and preventive remediation.

•Maintained Git repositories and supported development teams with branching strategies, merge conflict resolution, and repository hygiene.

•Worked closely with development and QA teams to validate environment readiness, resolve configuration mismatches, and ensure smooth deployment cycles.

•Participated in on-call rotations and provided timely responses to alerts, collaborating with senior engineers to resolve high-priority issues.

•Performed cost monitoring activities in AWS by identifying idle resources, evaluating underutilized compute, and suggesting cleanup opportunities.

•Conducted basic performance tuning for EC2 instances and web servers by analyzing CloudWatch metrics and adjusting resource allocations.

•Created internal runbooks, environment setup guides, and process documentation to help teams adopt consistent DevOps practices.

•Contributed to the broader DevOps adoption efforts within the organization by promoting automation, version control best practices, and cloud-first operational workflows.

Environment: AWS, Jenkins, GitLab CI/CD, Terraform, Ansible, Docker, Linux, CloudWatch, Git, Nginx, Apache, Python, Bash, RDS, EC2, VPC, IAM, S3

InfocusRx – Hyderabad, India May 2016 – Jul 2018

Systems & DevOps Engineer

Project Summary:

Worked as part of the engineering team supporting Linux-based systems and early cloud adoption across AWS and Google Cloud. Focused on automating deployments, managing servers, and helping application teams migrate initial workloads to cloud environments. Contributed to CI/CD, scripting, monitoring, and infrastructure setup across dev, QA, and staging environments.

•Managed and maintained Linux servers for internal applications, handling package updates, service configuration, user management, and system troubleshooting across development and staging environments.

•Managed Linux servers for internal applications, performing package updates, service configuration, access

management, and troubleshooting across multiple environments.

•Developed CI/CD pipelines using Jenkins to automate builds, testing, and deployments, reducing manual work during release cycles.

•Assisted in provisioning AWS and GCP resources including EC2/GCE instances, IAM/IAM roles, storage buckets (S3 and GCS), and basic VPC and networking configurations.

•Automated recurring operational tasks using Bash and Python scripts, improving efficiency for maintenance, cleanup, and environment setup tasks.

•Used Terraform to create foundational infrastructure components such as compute instances, storage, IAM, and networking resources in AWS and GCP.

•Supported early container adoption using Docker, helping build images for development teams and introducing standardized build environments.

•Implemented monitoring and log collection using CloudWatch, Stackdriver (now Google Cloud Operations Suite), and open-source tools to improve visibility into system and application performance.

•Helped configure Nginx and Apache servers including SSL setup, reverse proxies, virtual hosts, and tuning for performance and stability.

•Assisted developers with Git workflows, branching strategies, and resolving merge conflicts to maintain clean source control practices.

•Participated in early cloud migration initiatives by helping prepare server images, validate configurations, and support testing on AWS and GCP.

•Supported database teams with basic MySQL and PostgreSQL administration tasks such as user creation, backup verification, and connectivity troubleshooting.

•Helped resolve network issues involving DNS, firewall rules, security groups, and VPC/subnet connectivity across both AWS and GCP environments.

•Used Ansible to automate environment setup, package installation, and configuration updates for Linux servers.

•Worked closely with senior engineers to ensure cloud resource usage adhered to internal security and access control policies.

•Investigated performance issues on Linux servers, reviewing logs, analyzing resource usage, and tuning configurations to improve application responsiveness.

•Managed IAM policies, service accounts, and access permissions in AWS IAM and GCP IAM to maintain secure and controlled cloud access.

•Participated in on-call rotations and responded to infrastructure incidents, working with senior engineers to resolve production-impacting issues.

•Performed cost awareness activities by identifying unused resources in AWS and GCP, monitoring storage usage, and suggesting cleanup opportunities.

•Created internal documentation, environment setup guides, and runbooks to support onboarding and ensure consistent DevOps practices.

•Collaborated with developers and QA teams to streamline deployments, resolve environment issues, and support ongoing DevOps adoption.

Environment: AWS, GCP, Jenkins, Terraform, Ansible, Linux (RHEL/Ubuntu), Docker, CloudWatch, Stackdriver, Git, Python, Bash, Nginx, Apache, MySQL, PostgreSQL, EC2, GCE, S3, GCS, IAM, VPC, SSH, Shell Scripting

Contact this candidate