Srija Reddy
Sr DevOps/ SRE Engineer
AWS Certified DevOps Engineer – Professional
Email: **********@*****.*** Phone: +1-682-***-**** Professional Summary:
DevOps Engineer with 8+ years of experience in designing scalable CI/CD pipelines, cloud-native infrastructure, and automation across AWS, GCP, AZURE.
• Designed and deployed microservices using AWS ECS, Lambda, and API Gateway with automated CI/CD pipelines.
• Built and managed GCP infrastructure using Terraform, applying IaC best practices for scalable environments.
• Developed CI/CD workflows across Jenkins, GitHub Actions, Azure DevOps, and GCP Cloud Build.
• Implemented container orchestration with Kubernetes (EKS, GKE, AKS) and managed Helm-based deployments.
• Monitored system health with Prometheus, Grafana, SkyWalking, and alerting via Opsgenie.
• Configured log aggregation and observability using the ELK Stack, Datadog, Splunk and Honeycomb.
• Set up application performance monitoring and tracing using Datadog, Sky Walking, and Grafana.
• Managed secrets and access using AWS IAM, GCP Secret Manager, and Azure Key Vault with RBAC policies.
• Designed real-time and batch ETL pipelines using Apache Airflow, AWS Glue, and GCP Dataflow.
• Engineered data lakes and warehouses with Amazon S3, Redshift, Big Query, and Athena for analytics.
• Transformed raw data using Python, SQL, and ingestion tools across APIs, files, and relational databases.
• Developed scalable RESTful APIs in Java with Spring Boot, documented using Swagger for seamless integration.
• Applied core Java principles including multithreading, collections, and exception handling for backend services.
• Created and optimized build pipelines using Maven, Gradle, Azure DevOps, and artifact repositories.
• Deployed microservices in Docker containers with GitOps workflows powered by ArgoCD.
• Implemented microservices architecture using Spring Boot, Spring Cloud, and secured endpoints with Spring Security.
• Applied design patterns like DAO, Singleton, and Service Locator in enterprise applications using Java EE and MVC architecture.
• Managed incident workflows and service requests via ServiceNow, Fresh-service, and Jira Service Management. Managerial Summary
Results-driven DevOps and Cloud Engineer with a proven record of delivering scalable, secure, and automated cloud infrastructure across Azure, AWS, and GCP environments. Adept at leading end-to-end cloud projects covering CI/CD automation, system monitoring, and platform hardening to drive uptime, efficiency, and compliance. Known for partnering with cross-functional teams to modernize legacy systems, optimize cloud spend, and mentor teams on cloud-native best practices. Delivered measurable outcomes such as 60% reduction in manual interventions and 35% faster incident triage through unified observability platforms.
Certification:
AWS Certified DevOps Engineer – Professional
Professional Experience:
Sr DevOps Engineer/SRE
Michaels, Irving, TX
05/2023-Current
At Michaels Stores, Inc., I was responsible for building and maintaining the end-to-end e-commerce platform across development, UAT, and production environments using GCP services. This included infrastructure provisioning, CI/CD automation, security enforcement, and observability implementation. I also monitored and supported critical business systems like self-checkout and B2B applications, ensuring high availability, performance, and compliance. My role bridged cloud engineering, DevOps, and data platform operations to deliver scalable and secure digital retail experiences.
• Built and managed GCP infrastructure including Cloud Storage, Pub/Sub, BigQuery, GKE (Autopilot & Standard), and Anthos across dev, UAT, and prod environments.
• Configured and maintained CI/CD pipelines using Jenkins, GitHub Actions, JFrog Artifactory, and ArgoCD across all environments for streamlined deployments.
• Integrated OpenText TeamSite into CI/CD pipelines to automate publishing workflows, enabling seamless content delivery to GCP Cloud Storage, web servers, and hybrid endpoints using Ansible and Python scripts.
• Built CMS deployment automation for TeamSite and LiveSite using custom Bash and Terraform scripts, improving release frequency and reducing manual deployment errors by 70%.
• Integrated Ansible playbooks into Jenkins for automated provisioning and deployments.
• Managed legacy server configurations with Chef during the cloud migration phase.
• Automated RedHat Linux OS patch management using Ansible integrated with Jenkins pipelines, reducing manual intervention by 60%.
• Integrated SonarQube with Jenkins pipelines for static code analysis and quality gates, improving code maintainability and reducing technical debt.
• Developed bash scripts for log rotation, user access audits, and kernel parameter tuning across UAT and production nodes.
• Automated IAM policies and GCP resource provisioning using Terraform, ensuring consistency and compliance via Infrastructure as Code (IaC).
• Designed and managed GCP firewall rules for internal services and external APIs, implementing layered security zones and strict ingress/egress policies across VPCs.
• Integrated Prisma Cloud for runtime security, vulnerability management, and compliance monitoring across GKE workloads and CI/CD pipelines.
• Implemented cloud-native vulnerability scanning pipelines using Prisma Cloud and custom Python automation, reducing high/critical security issues in production environments by 40%.
• Integrated Kong API Gateway to manage traffic routing, authentication, and rate limiting for microservices deployed on GKE, improving security and reliability of service-to-service communications
• onboarding.
• Deployed ELK stack and Datadog in GCP for centralized logging and integrated Stackdriver (Cloud Monitoring) and Prometheus for advanced observability.
• Built and maintained Grafana dashboards and Opsgenie alerting for real-time monitoring of infrastructure, application health, and business KPIs.
• Integrated Splunk and Datadog with GCP and Kubernetes environments to deliver comprehensive observability, incident detection, and troubleshooting workflows for production middleware services.
• Created custom Google Cloud Monitoring alert policies for DBAs and business-critical components; integrated alerting with Opsgenie to trigger on-call rotations.
• Worked on GTM (Global Traffic Management) and LTM (Local Traffic Managers), including traffic steering, load balancing, and DNS-based routing.
• Prioritized security and compliance with enforced IAM role bindings, VPC Service Controls, and secure service-to- service communication.
• Ensured incidents and alerts automatically triggered tickets in ServiceNow and Jira, enabling structured tracking and resolution workflows.
• Created and maintained detailed Confluence documentation for all infrastructure components and operational procedures.
• Conducted RCA and performance tuning of GCP-based services, including GKE workloads and Dataflow pipelines, to ensure high availability and cost optimization.
• Developed and deployed scalable machine learning models on Vertex AI, leveraging AutoML and custom training pipelines for improved prediction accuracy and reduced training time.
• Collaborated with data engineering teams to optimize BigQuery queries and data models—improving performance and reducing cost.
• Designed and orchestrated Dataproc jobs for Looker report generation, scheduled via Cloud Composer (Airflow) for daily business reporting.
• Built scalable microservices using Java and Spring Boot, integrated with GCP services.
• Tuned and debugged Java applications in GKE for optimal runtime performance.
• Handled all production deployments, working closely with developers and QA for smooth rollouts. Sr Cloud Engineer
SLALOM, Seattle, WA
5/2022 – 05/2023
The project focused on building a scalable, cloud-native infrastructure on GCP to support data engineering and microservices. It leveraged Kubernetes, CI/CD with GitHub and Jenkins, and automated ML workflows using Airflow and Prefect. Observability was ensured with Stackdriver, Prometheus, and Grafana, while Istio and Anthos enabled secure and centralized management.
• Built and managed GCP infrastructure using Terraform templates sourced from GitHub repositories.
• Partnered with infrastructure teams to lead cloud transformation for the Data Engineering team to identify optimal storage solutions.
• Used GitHub, Jenkins, and Artifact Registry for version control, continuous integration, and secure artifact storage to streamline application deployment on GKE clusters.
• Developed and updated infrastructure stacks in GCP to adapt to changing organizational needs, ensuring scalable and reliable cloud resource management.
• Implemented Ansible for automated configuration management across environments and created custom dashboards using JIRA advanced filters to track team performance and sprint progress.
• Built custom Ansible playbooks to automate TeamSite/LiveSite component deployments across dev and UAT environments, reducing manual configuration errors and accelerating onboarding of new CMS modules.
• Enabled multi-cluster Kubernetes management by configuring Anthos and GKE Enterprise, leveraging Config Sync for centralized policy and configuration control.
• Integrated Istio service mesh to manage and secure traffic between microservices running on GKE, enabling features like traffic routing, observability, and mTLS.
• Set up and enforced GitHub branching policies across teams to maintain code quality and ensure smooth CI/CD processes with structured collaboration.
• Monitored system logs and configured custom alerts for GCP instances and GKE workloads using Stackdriver (now Cloud Operations) for proactive system health management.
• Participated in Agile development processes, including daily scrum meetings, to address blockers, align deliverables, and foster cross-functional collaboration.
• Built and managed scalable data platforms using S3, Athena, Redshift, and GCP-native tools to support analytics, reporting, and data warehousing requirements.
• Used Grafana, Prometheus, and PagerDuty to achieve full-stack observability, ensuring real-time monitoring, alerting, and rapid incident response.
• Supported service reliability using Skywalking, Honeycomb, and ticketing systems like ServiceNow, Fresh service, and Jira.
• Deployed models for real-time inference using TensorFlow Serving, TorchServe, and FastAPI. Ensured high availability and low-latency predictions in production systems.
• Automated ML pipelines with Airflow, Dagster, and Prefect to orchestrate data ingestion, model training, and deployment.
• Developed and maintained microservices in Java using Spring Boot framework, ensuring high performance and scalability.
• Implemented RESTful APIs and integrated third-party services to enhance application functionality and data exchange.
• Conducted unit and integration testing using JUnit and Mockito, improving code quality and reducing production bugs.
• Designed and optimized ETL/ELT pipelines for batch and real-time workloads using Apache Airflow, Python, and SQL.
• Extracted and transformed structured and semi-structured data into BigQuery, Cloud Storage, and Data Lakes.
• Implemented CDC and incremental loading strategies for high-availability data pipelines. Cloud Infrastructure Engineer – AWS
Signify, Somerset, NJ
08/2021 - 04/2022
The project focused on building scalable AWS infrastructure using EC2, EKS, and CloudFormation to support broadband and cloud-native applications. CI/CD pipelines were automated with Jenkins and CodePipeline, while Terraform and Chef handled infrastructure as code. Applications were containerized with Docker and deployed on EKS, and data workflows were managed using AWS Glue and SageMaker.
• Created and designed AWS infrastructure utilizing services such as EC2, RDS, VPC and operated in networking tools Route53, AWS Client VPN, AWS Direct Connect and Elastic Load Balancer.
• In Signify, I created/orchestrated CI/CD pipelines and maintained AWS services like EKS for providing broadband connectivity like 4G/5G and Wi-Fi infrastructure, as well gave a platform for multiple number of applications.
• Built and maintained secure AWS cloud infrastructure utilizing Chef with AWS CloudFormation.
• Architected and maintained AWS Network Firewall to enforce east-west and north-south traffic controls, reducing exposure to lateral threats.
• Developed automation scripts to manage firewall rules via Terraform modules, supporting environment-specific policy enforcement.
• Used AWS CodePipeline continuous integration, server installation and configuration to automate application packaging and deployments by integrating it with AWS CodeCommit.
• Executed serverless Athena queries over S3 logs and VPC Flow Logs to diagnose inter-service latency and dropped packet issues, accelerating debug cycles by 40%
• Used AWS CloudWatch, CloudTrail, and SNS for centralized logging, alerting, and monitoring across environments.
• Worked on EKS for deployment and load balancing to the application.
• Automated the infrastructure using Terraform and made it auditable by storing all infrastructure changes in a version control system like AWS CodeCommit.
• Developed utility scripts in PowerShell and Python for log rotation, snapshot cleanup, and service restart automation across EC2 and RDS instances, ensuring operational hygiene and reduced manual intervention.
• Worked on multiple areas of Jenkins like Plugin Management, Securing Jenkins, Performance issues, Analytics, Scaling, integrating Code Analysis.
• Deployed applications containerized using Docker onto EKS.
• Worked on AWS Elastic Beanstalk for fast deployment of various applications developed with Python.
• Managed Helm charts and Flux to build Kubernetes applications, templatize Kubernetes manifests, provide configuration parameters to customize deployments, and managed releases of Helm packages.
• Developed and maintained backend services and RESTful APIs using Java, enhancing system performance and reliability for cloud-native applications.
• Designed and implemented data pipelines and ETL processes for large-scale data ingestion, transformation, and analysis using tools like Apache Spark and AWS Glue.
• Used Presto open-source SQL query engine designed to run queries on data stored in Hadoop.
• Used SageMaker Studio and SageMaker Notebooks to perform interactive model development, data exploration, and experimentation.
Infrastructure engineer
Cummins Inc, Pune, India
06/2017 - 02/2019
The project focused on building Java-based microservices with Spring Boot, integrated into Azure DevOps for CI/CD. Infrastructure was automated using Terraform, Puppet, and Ansible, with deployments on Azure and Kubernetes. PL/SQL was used for backend data processing, and system reliability was ensured through monitoring, backups, and secure access management.
• Designed and developed enterprise applications using Java, Groovy, and Spring Boot in a microservices architecture.
• Built and deployed Java applications using Maven, Ant, and integrated them into Azure DevOps pipelines.
• Used WebSphere for Java app deployment and supported production releases using Shell and Ansible scripts.
• Wrote PL/SQL queries and stored procedures to support backend data workflows for Java-based systems.
• Collaborated with cross-functional teams to implement secure and efficient Java solutions integrated with Oracle, SQL, and Teradata databases.
• Worked on reliability, customer support, and provided infrastructure solutions using Azure services.
• Created Azure infrastructure via Terraform to provision scalable cloud resources for various projects.
• Managed access and security rights across enterprise systems using Active Directory, Group Policy, and DNS.
• Implemented infrastructure as code using Puppet and deployed containerized workloads via Docker, Kubernetes, and Helm.
• Automated infrastructure provisioning and compliance checks using Puppet and Ansible for Windows and Linux workloads.
• Implemented WebLogic server maintenance and migration activities, ensuring legacy Java applications transitioned smoothly to modern platforms.
• Participated in SDLC lifecycle from requirements through deployment, with strong collaboration across development and QA teams.
• Designed and implemented efficient PL/SQL procedures, functions, and triggers for data validation and processing.
• Used N-Central for device management, patch deployment, and system monitoring across hybrid environments.
• Supported Veeam backup solutions and maintained firewall security using Fortigate appliances.
• Led infrastructure migration initiatives and ensured highly reliable outcomes across different cloud ecosystems.
• Created and maintained internal documentation covering infrastructure, automation tools, and security protocols.
• Developed provisioning and automation scripts using Python, Shell, and Ansible for CI/CD processes.
• Used tools like Git, JIRA, MS Excel, and MS Office for source control, issue tracking, and reporting. Java Developer.
Rushkar Technology Pvt. Ltd, Ahmedabad, India
04/2015 - 10/2017
The project involved developing web applications using Java, JSP, and Struts with MySQL for database management. It followed Agile practices and included front-end development with HTML and CSS. The team supported testing phases and managed deployments using ANT scripts.
• Worked in an Agile environment and participated in the software design life cycle process
• Contributed to both high-level and low-level design documentation for the application's process flow of control
• Participated in database design and wrote SQL queries using MySQL
• Utilized Java, Servlets, and JSP to create web-based modules
• Developed applications using the MVC design pattern and the Struts framework
• Utilized Eclipse IDE for project development
• Assisted in resolving issues during the System Integration Testing (SIT) and
• User Acceptance Testing (UAT) phases
• Used HTML, CSS3, JSP, and custom tag libraries to develop a dynamic web application
• Assisted in deploying the application using ANT scripting