KUMARA TADEPALLI
+1-850-***-**** ************@*****.***
SUMMARY
Results-driven AWS Cloud Infrastructure Engineer with 4+ years of experience delivering advanced incident management, platform support, and operational excellence across large-scale AWS environments. Proficient in core AWS services, including EC2, S3, VPC, IAM, CloudWatch, CloudFormation, ECS, and EKS, with a strong track record of ensuring high availability and reliability for business-critical applications. Hands-on expertise in Kubernetes and Terraform for infrastructure automation, enabling scalable, repeatable deployments and reducing operational overhead. Adept at serving as the primary escalation point for application developers, conducting thorough root cause analyses, and implementing preventative strategies to minimize recurring incidents. Experienced in building and optimizing CI/CD pipelines, automating operational workflows using Python and Bash, and enforcing security best practices through IAM policies and compliance monitoring. Passionate about enabling developer self-sufficiency through documentation, training, and proactive knowledge sharing to drive faster issue resolution and improved platform performance.
SKILLS
•Site Reliability Engineering (SRE): Monitoring & Observability, Alerting & Dashboards, Incident Management, Mttr, On-Call Operations, Root Cause Analysis (RCA), Postmortem/Blameless Reviews, Error Budgets, Fault Analysis, Traces, Capacity Planning, Performance Tuning, Load Balancing, Auto-Scaling, Disaster Recovery, Runbooks, SLA, SLO/SLI Definition
•Cloud Platforms & Infrastructure: AWS (EC2, EKS, VPC, IAM, CloudWatch, Lambda, S3, RDS, EBS, CloudFormation), Azure (App Services, VMs), Kubernetes, Docker, Linux /Unix, New Relic, Jira, OCI, VMware, Esxi, Cloud Security.
•Infrastructure as Code & Automation: Terraform, Ansible, CloudFormation, Pulumi; Scripting (Python, Bash, PowerShell, Shell); Configuration Management, Virtualization Technology, Linux Kernel, Openshift, Dynatrace, APM, Elasticsearch, Packer, Arm.
•Observability & Monitoring Tools: Prometheus, Grafana (dashboards, alerts, metric correlation), Datadog, ELK Stack, Splunk, CloudWatch, Logging & Metrics Pipelines, Kibana, Dynatrace, Dashboarding, Kafka.
•CI/CD & DevOps Practices: GitLab CI, CircleCI, Jenkins, SonarQube, GitHub Actions, Argo CD, AWS CodePipeline, AWS CodeDeploy, Build Automation, Release Management, GitOps, Docker, Kubernetes, Bitbucket, Kibana, VMware, SonarQube, Automation tools, TeamCity, Qualys, SIEM.
•Distributed Systems & Data Infrastructure: Distributed system concepts, microservices architecture, fault tolerance, consistency/isolation, database recovery semantics, Backup/Restore strategies, Performance profiling, distributed databases, storage systems, and experience developing applications leveraging Metal for graphics processing, Nagios, Cloudflare.
•Programming Languages: Python, Bash, PowerShell, Shell Scripting, Java, Mysql, Go, C, C++, C#, Typescript, Yaml.
•Security & Compliance: IAM policies/roles, encryption, key management, Endpoint Security, Security Tools, Vault, SIEM.
•Soft Skills: Self-Motivated, Active-Listener, Attention to Detail, Decision-Making, Results-Oriented, Agile Methodologies, Mentoring, Articulating, Escalation, Partnership, Building Relationships, Leadership, Approachability, Adaptability, Feedback, Empathy, Coaching, Guidance, Discipline, Technical Presentation, Training Sessions, Tutorials, Verbal Communication.
•Certifications: Azure Certified Administrator, AWS Solutions Architect Associate, AWS Cloud Practitioner
EXPERIENCE
Florida Healthy Kids Corporation Jun 2025 – Present
Cloud DevOps Engineer
•Reduced mean time to resolution (MTTR) for P1/P2 incidents by 40% by implementing centralized monitoring and alerting pipelines using AWS CloudWatch dashboards and alarms across a multi-account environment hosting 100+ applications.
•Improved platform reliability and deployment consistency by 60% by architecting and managing production-grade EKS clusters with auto-scaling node groups, Kubernetes RBAC, and Helm-based application lifecycle management.
•Accelerated infrastructure provisioning by 70% by developing reusable Terraform modules and CloudFormation stacks to standardize VPC architectures, EC2 fleets, S3 lifecycle policies, and IAM roles across dev, staging, and production environments.
•Eliminated 95% of unauthorized access incidents by enforcing least-privilege IAM policies, implementing Service Control Policies (SCPs) across AWS Organizations, and automating compliance audits via AWS Config and CloudWatch Events.
•Scaled containerized workload capacity by 3x by migrating legacy workloads between ECS Fargate and EKS, cutting per-service infrastructure costs and increasing deployment frequency through ArgoCD-based GitOps pipelines.
•Decreased recurring platform support tickets by 35% by conducting systematic root cause analyses, authoring developer-facing runbooks, and delivering self-service troubleshooting training on AWS platform best practices.
•Increased operational efficiency by 50% by building Python and Bash automation scripts to handle routine platform tasks, including S3 data lifecycle management, EC2 instance scheduling, and VPC security group compliance auditing.
Florida Department of State Jun 2024 – Jun 2025
DevOps Infrastructure Engineer
•Directed incident management initiatives that reduced response time by 30%, while designing and deploying secure and scalable AWS cloud infrastructure solutions that improved system uptime by 25%.
•Developed and maintained Infrastructure-as-Code (IaC) using Terraform and Ansible to streamline deployment pipelines across multi-cloud platforms, with a focus on investigating AWS best practices and security compliance.
•Optimized system performance by 25% through capacity planning, load balancing, and fine-tuning observability with tools such as Prometheus and Grafana, ensuring reliable monitoring and logging.
•Automated operational tasks using Python and Bash scripting, reducing manual intervention time by 40% and reinforcing efficient Linux-based system administration.
•Managed containerization efforts by deploying Docker and Kubernetes, enhancing application reliability and scalability by 50% and aligning with cloud-native DevOps methodologies.
•Served as a key player in enhancing QA processes, achieving a 50% reduction in post-deployment defects, and contributed to a culture of continuous improvement in secure production environments.
Integrated Musculoskeletal Care Jun 2023 - May 2024
DevOps Infrastructure Engineer
•Implemented real-time performance monitoring on AWS using Lambda, Splunk, CloudWatch, and the ELK stack, which reduced incident response times by 30% through improved alerting and orchestration.
•Strengthened application reliability and scalability by 50% through the proficient deployment and management of Docker and Kubernetes clusters, in line with cloud-native practices.
•Collaborated with cross-functional teams to support CI/CD pipeline development and automation using Gitlab, accelerating project delivery timelines by 20% through agile problem-solving and technical design reviews.
•Authored comprehensive system and testing documentation to ensure clarity and technical accuracy while aligning with departmental standards.
•Boosted deployment efficiency by 40% by automating cloud infrastructure deployments with AWS CloudFormation, Terraform, and Ansible, including the creation of reusable AMI templates, thereby enhancing secure and scalable environment setups.
Tata Consultancy Services Apr 2018 - Aug 2022
Software Engineer, Infrastructure & DevOps
•Engineered and optimized cloud-based solutions, large-scale distributed systems on AWS, Azure, and Google Cloud Platform (GCP) for large volume projects, demonstrating proficiency in deploying and managing resilient services, cloud storage, and resources.
•Excelled in troubleshooting over 500 complex issues across Linux and Windows platforms, leveraging expertise in a wide array of networking concepts and routing protocols (HTTPS, TCP/IP, DHCP, SSH, VPN, DNS, NAT, VPC, OSI, etc.) and tools (traceroute, iperf, dig, CURL), enhancing customer support efficiency by 40% and successfully diagnosing issues in production systems.
•Streamlined operational efficiencies by devising and implementing comprehensive cost optimization strategies, achieving a 25% reduction in expenses and surpassing 99.99% uptime through proactive management of alerts, logs, and scalability enhancements.
•Engineered Continuous Integration (CI) and Continuous Delivery (CD) pipelines using Jenkins, Azure DevOps, GitHub, AWS Code Pipeline, and AWS Code Deploy across multiple projects, reducing deployment time by 60%, and optimized AWS EC2 (Linux) workloads with Terraform, Puppet, and Chef, enhancing resource efficiency.
•Proficient in utilizing Git version control system for managing and collaborating on codebases, writing security access control using IAM policies/roles, and managing functional requirements and technical specifications.
•Enhanced SQL Server database performance and security in Azure by 30% using DevOps practices, automated tuning, and CI/CD pipelines, boosting system reliability and efficiency.
•Enhanced Database design, complex SQL queries, and stored procedures for efficient data retrieval and manipulation, enhancing data processing workflows; developed normalized database schemas for optimal performance and collaborated closely with analysts to provide optimized SQL solutions for accurate data-driven insights.
•Engineered robust auto-scaling and load-balancing solutions on AWS, achieving a 99.9% high availability and significantly improving application scalability and reliability.
•Engineered serverless applications by harnessing AWS Lambda, API Gateway, S3, and DynamoDB to optimize cloud architecture for enhanced efficiency and scalability in various projects.
•Pioneered the migration of 5+ critical legacy systems to AWS using cloud services such as EC2, ECS, S3, and DynamoDB, which enhanced system scalability and reduced operational costs by 25%.
•Engineered innovative solutions to complex challenges by adhering to established engineering principles and practices, while developing novel methods and strategies for unique technical solutions.
•Proficient in shell scripting language using Linux/Unix, Ubuntu, PowerShell, Bash, and command line interface, providing customer-facing technical support, creating technical documentation and comprehensive training for provisioned web servers.
•Demonstrated expertise in conducting root cause analysis of critical production errors, diagnosing issues and triaging production problems while resolving them through automated testing frameworks, continuous monitoring with tools like Prometheus and Grafana, and effective troubleshooting techniques, reducing downtime by 60% across 10+ production environments while providing professional customer service and maintaining clear communication with stakeholders.
•Serving the organization by managing the ticketing system for incident reporting and resolution, ensuring thoroughness and organizational responsibility in addressing critical events.
EDUCATION
Florida State University
Master's, Computer Science (CS)
Coursework: Artificial Intelligence, Concurrent Parallel Distributed Programming, Data and Computer Communications, Computer and Network Administration, Advanced Data Science, Machine Learning, Software Development Life Cycle.
PROJECTS
Cloud Infrastructure Automation and Deployment
•Designed a highly scalable, automated infrastructure deployment pipeline using Terraform, Ansible, and Azure DevOps, while incorporating Packer for image creation.
•Deployed AWS Kubernetes Services (AWS EKS) with end-to-end observability tools to ensure reliability and fault tolerance, including dashboarding solutions for performance tracking.
•Reduced manual provisioning by 35% through Infrastructure as Code (IaC) practices, improving efficiency and contributing to the overall management system.
DevSecOps CI/CD Project
•Developed GitHub Actions workflows for automated unit testing, static code analysis, and Docker image management, enhancing team collaboration.
•Integrated container scanning with Trivy for vulnerability detection and ensured secure deployment through GitHub Container Registry.
•Created a multi-stage Dockerfile to optimize build processes and reduce image size while establishing best practices.
•Configured Argo CD for continuous delivery, enabling automated application deployment to Kubernetes clusters, actively promoting Solution Delivery methodologies to support global organizational goals.