Post Job Free
Sign in

Site Reliability Engineer (OpenShift / Platform Engineering)

Company:
Career Developers
Location:
Reston, VA
Posted:
April 07, 2026
Apply

Description:

Site Reliability Engineer (OpenShift / Azure Platform Engineering)

Location : Reston, VA

Salary : 180 - 190K

Must have the following : Red Hat OpenShift, SRE / Platform Engineering experience, Observability tools, Automation & scripting (Python/Bash), Incident response, Kubernetes / container platforms, Azure cloud is a plus

Responsibilities :

Maintain the overall health, performance, and reliability of core technical platforms across on-premise data centers and Azure cloud environments.

Lead incident response, root-cause analysis, and post-incident remediation to ensure long-term platform stability.

Support and optimize Red Hat OpenShift environments, including cluster management, routing, operators, and platform services.

Manage and improve observability using tools such as Grafana, Prometheus, and Datadog to ensure full platform visibility.

Build and maintain CI/CD pipelines and automated deployment workflows supporting engineering and data teams.

Partner with development teams to resolve deployment, configuration, and routing issues.

Develop automation scripts and tooling to reduce manual intervention and improve system reliability.

Deliver new platform services and enhancements across hybrid infrastructure environments.

Maintain accurate documentation, runbooks, and incident response procedures.

Participate in on-call rotation supporting production systems.

Requirements :

Bachelor's degree in Computer Science or related field, or equivalent experience.

5–7+ years of Site Reliability Engineering, Platform Engineering, or DevOps experience.

Strong experience with Red Hat OpenShift / Kubernetes platforms.

Experience with Microsoft Azure cloud services.

Experience with messaging platforms such as AMQ, Kafka, or Redis.

Experience with HashiCorp Vault or similar secrets management tools.

Strong scripting skills in Python, Bash, or PowerShell.

Experience with observability tools such as Datadog, Grafana, Prometheus.

Strong troubleshooting skills using logs, metrics, traces, and debugging tools.

Experience working in regulated or highly audited environments preferred.

Ability to manage multiple priorities in a fast-paced environment.

Strong written and verbal communication skills.

Site Reliability Engineer, SRE, OpenShift engineer, Kubernetes engineer, Azure cloud engineer, platform engineer, DevOps engineer, observability, Grafana, Prometheus, Datadog, HashiCorp Vault, Kafka, AMQ, Redis, CI/CD, automation, Bash scripting, Python scripting, cloud infrastructure, hybrid cloud, data center, reliability engineering, incident response, root cause analysis, container platform, cluster management, Azure infrastructure, production support, platform reliability, DevOps, monitoring tools, automation engineer, enterprise infrastructure, platform services, site reliability, cloud platform, OpenShift administrator, Kubernetes troubleshooting

Apply