Cloud Infrastructure & SRE Engineer

Location:

Anna, TX

Salary:

95000 - 125000

Posted:

March 09, 2026

Contact this candidate

Resume:

BONAVENTURE (BONO) CHINANGA

CLOUD INFRASTRUCTURE ENGINEER DEVOPS ENGINEER SITE RELIABILITY ENGINEER (SRE) ANNA, TX

+1-651-***-**** ***********@*****.***

PROFESSIONAL SUMMARY

AWS-certified Cloud Infrastructure & SRE professional with 4+ years of experience building and maintaining highly reliable, scalable AWS environments for fintech and semiconductor workloads. Expert in Infrastructure as Code (Terraform, CloudFormation), container orchestration

(EKS, Docker), CI/CD automation, observability (CloudWatch, Prometheus, Grafana), incident management, SLO/SLI definition, toil reduction, and cloud cost optimization. Delivered 99.99% uptime, reduced deployment times by up to 70%, cut cloud costs 25-40%, and improved MTTR through proactive reliability engineering. TECHNICAL SKILLS

Cloud & Platforms: AWS (EC2, S3, VPC, Lambda, RDS, ECS, EKS, Elastic Load Balancing, Auto Scaling, CloudFormation, IAM, Route 53, CloudWatch)

IaC & Automation: Terraform, Ansible, Python, Bash CI/CD & Tools: Jenkins, GitHub Actions, Git

Containers & Orchestration: Docker, Kubernetes (Amazon EKS) Monitoring & SRE Practices: Prometheus, Grafana, AWS CloudWatch, ELK Stack, SLO/SLI, Error Budgets, MTTR/MTTD Reduction, On-Call Incident Response, Blameless Post-Mortems

Other: Linux Administration, Networking (VPC Peering, Subnets, Security Groups), Security Best Practices (Encryption, Least Privilege IAM), Chaos Engineering, Cost Optimization (Rightsizing, Reserved Instances) Professional Experience

Cloud Infrastructure Engineer / Analog Devices February 2024 - Present

Led migration of compute-intensive semiconductor design workloads from on-premises to AWS multi-AZ architecture using Terraform and EC2 Auto Scaling Groups; achieved 99.99% uptime for simulation tools and reduced provisioning time from days to hours.

Designed and implemented observability stack with Prometheus, Grafana, and CloudWatch for real-time monitoring of high- performance computing clusters; defined custom SLIs/SLOs (99.95% target), managed error budgets, and reduced MTTR from 45 minutes to under 15 minutes during incidents.

Automated infrastructure provisioning and configuration drift remediation using Terraform modules and Ansible playbooks for development, staging, and production environments; eliminated manual toil by 80% and accelerated environment spin-up for engineering teams by 65%.

Optimized AWS resource utilization through instance rightsizing, Savings Plans adoption, and automated tagging policies; identified and remediated underutilized EC2/RDS resources, delivering 30-35% monthly cost reduction on compute and storage while maintaining performance for CAD workloads.

Enforced security hardening (IAM roles with least privilege, KMS encryption, VPC security groups, and AWS Config rules) across semiconductor IP environments; passed internal audits with zero critical findings and mitigated potential vulnerabilities proactively.

Conducted chaos engineering experiments (using AWS Fault Injection Simulator) and led blameless post-mortems for production incidents; improved system resilience and prevented recurrence of 3 major outages in the last year. DevOps Engineer / Cloud Engineer / SoFi January 2022 - January 2024

Built and maintained CI/CD pipelines with Jenkins and AWS CodePipeline for microservices deployments to Amazon ECS and EKS; reduced release cycle from 2 weeks to daily deployments and decreased failed deployments by 65% through automated testing and blue/green strategies.

Migrated legacy monolithic fintech applications to cloud-native AWS services (Lambda for serverless APIs, RDS Aurora for databases, S3 for secure document storage); improved transaction processing scalability to handle 2M+ daily peak loads and boosted overall system performance by 50%.

Orchestrated containerized workloads on Amazon EKS with Docker; implemented Horizontal Pod Autoscaler, Cluster Autoscaler, and service mesh for zero-downtime rolling updates, resulting in 99.99% availability during high-traffic periods like payday loan processing spikes.

Established comprehensive monitoring and alerting using CloudWatch, Prometheus, and Grafana; set up proactive dashboards for key metrics (latency, error rates, throughput), reduced MTTD by 45%, and ensured SLO compliance for critical financial services.

Implemented security controls including AWS WAF, GuardDuty, IAM policies with MFA, and encryption in transit/rest; maintained zero critical security incidents over 2 years and supported SOC 2 compliance for fintech regulatory requirements.

Drove cloud cost optimization initiatives using AWS Cost Explorer, Trusted Advisor, and auto-scaling policies; rightsized over- provisioned resources and adopted Reserved Instances, achieving 35-40% reduction in monthly AWS spend while supporting rapid business growth.

Participated in 24/7 on-call rotations as SRE; resolved P1/P2 incidents, authored runbooks, and implemented automated remediation scripts (Python/Lambda), cutting average incident resolution time by 50% and reducing toil. CERTIFICATIONS

AWS Certified Solutions Architect - Associate (SAA-C03)

AWS Certified Cloud Practitioner (CLF-C01)

Contact this candidate