Mainframe Site Reliability Engineering (SRE)

Company:

IT America Inc

Location:

Worthington, OH, 43085

Posted:

April 25, 2026

Apply

Description:

Job Description

Position: Mainframe Site Reliability Engineering (SRE)

Location: Columbus, OH (Onsite)

Duration: Long term contract

We need someone from Mainframe Production support who has exp with L2/L3 production support

Role Summary:

The Mainframe SRE is responsible for ensuring the reliability, availability, performance, and scalability of enterprise mainframe platforms. This role blends traditional mainframe engineering with modern SRE principles, focusing on automation, observability, incident management, and continuous improvement. The lead will guide a team of engineers while partnering closely with application, infrastructure, and operations teams.

Key Responsibilities:

Lead the Mainframe SRE team, providing technical direction, mentoring, and performance guidance

Own the reliability, availability, and resilience of mainframe environments (z/OS and related subsystems)

Define and implement SRE practices such as SLIs, SLOs, SLAs, error budgets, and reliability metrics

Drive automation to reduce manual operations, improve recovery time, and enhance system stability

Oversee monitoring, alerting, and observability for mainframe systems using modern and legacy tools

Lead incident management, root cause analysis (RCA), and post-incident reviews

Partner with application development teams to improve reliability, performance, and deployment practices

Plan and execute capacity management, performance tuning, and workload optimization

Ensure compliance with security, regulatory, and audit requirements

Lead disaster recovery (DR) planning, testing, and high-availability strategies

Champion continuous improvement, DevOps, and SRE culture within mainframe operations

Required Qualifications:

10+ years of experience in mainframe systems engineering or operations

Strong hands-on expertise with IBM z/OS

Experience with core mainframe components such as:

CICS, IMS, DB2

JES2/JES3

MQ, SMF, SDSF

Solid understanding of mainframe performance tuning and capacity planning

Experience leading production support and managing major incidents

Strong scripting and automation skills (REXX, JCL, CLIST, Python, or equivalent)

Familiarity with monitoring and scheduling tools (e.g., OMEGAMON, CA/BMC tools, Control-M)

Preferred Qualifications:

Experience applying SRE principles in a mainframe or hybrid (mainframe + distributed) environment

Exposure to DevOps, CI/CD, and automation frameworks

Knowledge of Linux on Z and cloud integration patterns

Experience with resilience engineering, chaos testing, or fault injection concepts

Prior people-lead or technical-lead experience

Permanent

Apply

Mainframe Site Reliability Engineering (SRE)

Description:

Report this job