Senior Site Reliability Engineer (SRE)

Company:

Metasys Technologies

Location:

Paradise, NV, 89105

Posted:

May 14, 2025

Apply

Description:

Senior Site Reliability Engineer (SRE)

Remote Position

6+Month Contract

Client is seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team and help ensure the reliability, availability, and scalability of our systems. As a Senior SRE, you will work closely with development, operations, and security teams to build, monitor, and improve infrastructure and application performance while implementing best practices in automation and incident management.

Key Responsibilities:

System Reliability & Performance

Ensure high availability and reliability of applications and infrastructure.

Design and implement robust monitoring, logging, and alerting systems.

Conduct performance tuning and capacity planning to optimize system efficiency. Automation & Infrastructure as Code (IaC)

Develop and maintain automation tools to manage deployments and configurations.

Implement Infrastructure as Code (IaC) using tools like Terraform, Ansible, or CloudFormation.

utomate manual operational tasks to improve efficiency and reduce downtime. Incident Management & Troubleshooting

Participate in on-call rotations to quickly resolve incidents and prevent recurrence.

Perform root cause analysis (RCA) for production incidents and drive post-mortem reviews.

Develop and document runbooks to standardize response procedures. DevOps & CI/CD

Work closely with development teams to implement CI/CD pipelines for faster and safer deployments.

Optimize build and deployment workflows using Jenkins, GitHub Actions, or similar tools.

Ensure security and compliance best practices are embedded in the deployment process. Cloud & Infrastructure Management

Manage and optimize cloud-based infrastructure (AWS, Azure, GCP).

Implement container orchestration solutions using Kubernetes and Docker.

Ensure security best practices for cloud-based environments, including IAM and network security. Required Skills & Qualifications:

Technical Expertise

Strong experience in Linux/Unix system administration.

Hands-on experience with Kubernetes, Docker, and cloud platforms (AWS, Azure, or GCP).

Proficiency in Terraform, Ansible, or CloudFormation for Infrastructure as Code. Monitoring & Observability

Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, Datadog, or Splunk. Automation & Scripting

Strong scripting skills in Python, Bash, or Go.

Expertise in automating operational tasks and workflows. Incident Management & Troubleshooting

bility to analyze system failures and implement preventive solutions.

Experience with incident response and root cause analysis. CI/CD & DevOps Practices

Experience with CI/CD tools such as Jenkins, GitLab CI/CD, or GitHub Actions.

Familiarity with GitOps methodologies and release automation. Security & Compliance

Knowledge of network security, IAM, and compliance frameworks like SOC2, ISO27001. Preferred Qualifications:

Experience in SaaS, fintech, or high-scale distributed systems.

Certifications in AWS, Kubernetes (CKA/CKAD), or Terraform.

Familiarity with service mesh technologies like Istio or Linkerd.

Metasys Technologies is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identify, national origin, veteran or disability status.

Apply

Senior Site Reliability Engineer (SRE)

Description:

Report this job