Shaun K.
Senior DevOps & Cloud Platform Engineer SRE CI/CD Pipelines Terraform Multi-
Cloud (AWS, Azure, GCP) Kubernetes
*****.********@*****.*** 972-***-**** Jersey City, NJ 07306, USA Profile
I’m a Senior DevOps & Cloud Platform Engineer with 9+ years of experience designing, automating, and scaling secure, multi-cloud infrastructures across AWS, Azure, and GCP. I have led and optimized Kubernetes, Docker, Terraform, CI/CD, and GitOps pipelines, helping teams reduce deployment times by 40%, cut cloud costs by 25%, and achieve 99.99% uptime. Along the way, I’ve implemented SRE best practices, observability pipelines, and reliability strategies, ensuring highly resilient and performant systems. I enjoy mentoring high-performing DevOps teams and driving platform engineering strategies that align technical solutions with business goals. I’ve worked across healthcare, finance, e-commerce, and enterprise SaaS industries, building systems that are resilient, compliant (SOC2, HIPAA, PCI-DSS), and scalable. My passion lies in creating infrastructure that empowers teams, drives measurable business impact, and fosters continuous innovation, while leveraging FinOps strategies and automated compliance frameworks to deliver operational excellence. Skills
• Programming & Scripting: Python, Go, Bash, PowerShell, Java, SQL, YAML, JSON,
• Systems, OS & Infrastructure:: Linux (RHEL, Ubuntu, CentOS), Windows Server, Networking (TCP/IP, DNS, VPN, Load Balancing), VMware ESXi, Hyper-V, Storage Management, Backup & Recovery, and bare-metal provisioning.
• Containerization & Orchestration:: Docker, Kubernetes (EKS, AKS, GKE), Helm, OpenShift, AWS ECS, Istio, Linkerd, KEDA, Rancher, Podman, Kyverno (K8s Policy Enforcement), Cluster Autoscaler, and Service Mesh architectures.
• CI/CD & Automation:: Jenkins, GitLab CI/CD, GitHub Actions, ArgoCD, Tekton, Spinnaker, Bamboo, Harness, CircleCI, GitOps workflows, Policy-as-Code (OPA, Terraform Sentinel), Blue-Green & Canary Deployments, Automated Rollbacks, and pipeline governance.
•IaC & Config Management:: Terraform (modular & reusable IaC), Pulumi, CloudFormation, Packer, Ansible, Chef, Puppet, Helm, Crossplane, Kustomize, environment standardization, and infrastructure lifecycle automation.
• Cloud Platforms & Hybrid Infrastructure:: AWS (EKS, ECS, Lambda, RDS, S3, IAM), Azure (AKS, Functions, DevOps, Key Vault), GCP (GKE, Cloud Run, Cloud Functions, Pub/Sub), IBM Cloud, VMware Tanzu, Anthos, OpenStack, Multi- Cloud & Hybrid Environments, Serverless & PaaS, Backstage.io (Internal Developer Platforms).
• Monitoring, Observability & Reliability (SRE):: Prometheus, Grafana, Loki, ELK Stack, Splunk, Datadog, New Relic, CloudWatch, Azure Monitor, GCP Operations Suite, OpenTelemetry, Jaeger, PagerDuty, OpsGenie, Incident Response Automation, MTTD/MTTR Optimization, and AIOps-driven Monitoring (Dynatrace Davis AI, Datadog AIOps).
• Security, Compliance & Governance:: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, Prisma Cloud, Aqua Security, Snyk, Trivy, Falco, Sysdig Secure, SonarQube, SOC2, HIPAA, PCI-DSS, Zero Trust Architectures, NIST frameworks, and automated security scanning with policy enforcement.
• Cloud Security & Identity Management:: AWS IAM, Azure Active Directory, Google Identity, RBAC, MFA, SSO, Conditional Access, Identity Federation, Access Governance, and Privilege Management.
• FinOps, Cost & Performance Optimization:: Infracost, CloudZero, AWS Cost Explorer, Azure Advisor, Rightsizing, Performance Tuning, Autoscaling, Predictive Scaling, Cost Governance, and Capacity Planning.
• Version Control, Collaboration & Agile Practices:: Git, GitHub, GitLab, Bitbucket, Jira, Confluence, Slack, MS Teams, Agile & Scrum Methodologies, DevSecOps, Continuous Delivery, and team collaboration.
• Leadership & Strategic Enablement: Team mentorship, infrastructure modernization leadership, cross-team alignment with developers & security, roadmap planning, platform strategy execution, stakeholder communication. Professional Experience
03/2023 – Present Lead DevOps & Cloud Platform Engineer Nerdio
•Led enterprise-wide multi-cloud infrastructure modernization leveraging Terraform, Helm, Kubernetes, and Crossplane, automating 100+ environment deployments and improving provisioning efficiency by 60% while enhancing cross-region reliability.
•Delivered full-stack DevOps transformation for healthcare, fintech, and SaaS workloads embedding IaC pipelines, automated compliance scans (Trivy, Aqua, Vault), and achieving zero critical audit findings across HIPAA, SOC2, and PCI-DSS.
•Engineered an internal Developer Platform (PaaS) with Backstage.io, GitOps, and ArgoCD streamlining service onboarding, observability integration, and versioned IaC templates, cutting release lead time by 55%.
•Designed and implemented service mesh architectures with Istio and Linkerd, improving east-west traffic security, latency performance by 25%, and enforcing Zero Trust policies via OPA and Kyverno.
•Enhanced observability and incident response by deploying Prometheus, Grafana, Loki, and Datadog AIOps, integrating anomaly detection and auto-remediation scripts that reduced MTTR by 45%.
•Drove FinOps and cost governance initiatives across AWS, Azure, and GCP using CloudZero and Infracost to rightsize compute/storage workloads, reclaiming $400K+ annual savings without compromising SLAs.
•Mentored and guided a globally distributed team of 5+ DevOps and SRE engineers, establishing gold standards for IaC, CI/CD, container security, and cross-cloud resilience through design reviews and automation playbooks.
04/2019 – 02/2023 Senior Site Reliability Engineer CloudHesive
•Designed and managed high-availability cloud infrastructure across AWS and Azure for critical healthcare platforms serving 2M+ patients, ensuring 99.99% uptime through autoscaling, blue-green deployments, and advanced failover strategies.
•Built observability and reliability frameworks using Prometheus, Grafana, Loki, and ELK Stack to monitor health data pipelines, enabling proactive detection of anomalies and reducing incident frequency by 40%.
•Developed HIPAA-compliant CI/CD pipelines in Jenkins and GitLab CI integrating IaC
(Terraform, Ansible) and security scans (Snyk, Trivy), ensuring end-to-end compliance and automated drift correction.
•Collaborated with data and engineering teams to optimize healthcare data processing workflows (FHIR, HL7), improving ETL throughput by 35% and reducing operational overhead through Airflow and Kubernetes CronJobs.
•Automated disaster recovery and cross-region backup replication using Velero, CloudFormation, and Azure Recovery Vaults, cutting RTO/RPO by 60% while meeting regulatory requirements.
•Partnered with DevOps and platform leads to introduce chaos engineering practices
(Gremlin, LitmusChaos), validating system resilience under load and strengthening incident response readiness across multiple services.
08/2016 – 03/2019 DevOps Engineer
EffectiveSoft
•Implemented end-to-end CI/CD pipelines using Jenkins, GitLab, and ArgoCD for microservices running on AWS EKS and Azure AKS, reducing deployment times by 70% and eliminating manual release dependencies.
•Automated infrastructure provisioning with Terraform, Ansible, and Packer to create reusable blueprints across AWS, Azure, and GCP, enabling consistent environment rollout in minutes.
•Deployed and maintained Kubernetes clusters (EKS, AKS, GKE) with Helm and Istio service mesh for traffic routing, observability, and zero-downtime deployments across multi-cloud environments.
•Integrated security and compliance automation into build pipelines using SonarQube, Aqua, and HashiCorp Vault for secrets management, ensuring adherence to SOC2 and PCI-DSS standards.
•Built real-time monitoring and alerting systems with Prometheus, Grafana, ELK, and CloudWatch, achieving early anomaly detection and reducing MTTR by 45%.
•Collaborated with cross-functional teams to implement GitOps workflows and automated policy enforcement (OPA, Kyverno), enhancing governance and speeding up developer onboarding by 50%.
Projects
Multi-Cloud DevOps Platform Modernization (AWS Azure Terraform ArgoCD) Modernize enterprise CI/CD and infrastructure management for a large-scale SaaS healthcare platform serving 1M+ users across AWS and Azure.
•Architected and automated multi-cloud IaC frameworks with Terraform + Terragrunt, cutting provisioning time by 70% across AWS and Azure.
•Built GitOps-based CI/CD pipelines using Jenkins, ArgoCD, and Helm to enable zero-downtime deployments for containerized workloads.
•Integrated FinOps and Policy-as-Code (OPA, Sentinel) for cost governance and compliance across production environments.
•Mentored 6+ DevOps engineers while leading modernization from legacy CI/CD to container-native pipelines, improving release velocity by 2.3x.
SRE-Driven Reliability Automation for Healthcare Systems (GCP Kubernetes Prometheus Chaos Engineering) Enhance reliability, observability, and incident response for critical healthcare workloads ensuring HIPAA compliance and high availability.
•Designed and deployed a comprehensive observability stack with Prometheus, Grafana, and OpenTelemetry, reducing MTTR by 55%.
•Implemented chaos engineering and reliability testing with LitmusChaos and Gremlin to strengthen healthcare system resilience.
•Automated incident response and remediation using Python and Go scripts, resolving 35% of recurring incidents automatically.
•Ensured HIPAA and SOC2 compliance by embedding continuous security and audit controls into CI/CD workflows. Education
Bachelor’s in Computer Science
New Jersey Institute of Technology