Senior DevOps Engineer SRE
Location: Remote
Duration: 12+ months
Overview
We are seeking a highly skilled Senior DevOps Engineer Site Reliability Engineering (SRE) to lead the design, implementation, and reliability of scalable cloud infrastructure. This role focuses on ensuring high availability, performance optimization, and automation across AWS environments.
The ideal candidate will bring deep expertise in AWS, monitoring, and automation, with a strong SRE mindset to support mission-critical applications in a 24/7 production environment. You will work closely with engineering and operations teams to build resilient systems, improve observability, and drive operational excellence.
Required Skills
Strong hands-on experience with AWS cloud services and infrastructure management
Experience implementing alerts, alarms, and notifications using CloudWatch and/or Dynatrace
Experience working with AWS services such as Kafka, ECS, and EKS
Expertise in Infrastructure as Code (IaC) using Terraform or AWS CDK
Strong background in automation and configuration management
Experience with CI/CD pipelines (Jenkins, Azure DevOps, or similar tools)
Proven Site Reliability Engineering (SRE) experience in production environments
Strong Linux system administration and OS-level troubleshooting skills
Experience supporting 24/7 production environments, including incident response and RCA
Solid understanding of monitoring, observability, and performance tuning
Experience with networking fundamentals (TCP/IP, DNS, load balancing)
Preferred Skills
AWS certifications (DevOps Engineer or Solutions Architect)
Experience with Ansible, Python scripting, or other automation tools
Familiarity with high availability (HA) and disaster recovery (DR) architectures
Experience with container orchestration and microservices architecture
Knowledge of security best practices and vulnerability management tools
Experience working in enterprise-scale environments
Exposure to Java/.NET application deployments
Understanding of databases (SQL Server, Oracle)
Strong troubleshooting and problem-solving skills across infrastructure and applications
Experience with multi-region / multi-AZ AWS deployments