What’s needed to succeed (Minimum Qualifications):
Bachelor's degree in Computer Science, Information Systems, or related field
10+ years of experience in IT operations, data analytics, observability, or related fields
ITIL v4 Certification
Site Reliability Engineering (SRE) Certification
Certified Kubernetes Administrator (CKA)
AWS/Azure/GCP Cloud Certifications
Data Analytics Certifications (e.g., Google Data Analytics, Microsoft Certified\: Power BI Data Analyst)
Proven ability to maintain a high-level of client service
Expertise in IT performance monitoring tools (e.g., Splunk, Grafana, Datadog, New Relic, Dynatrace)
Strong knowledge of incident management, ITIL best practices, and service reliability engineering (SRE) principles
Hands-on experience with data visualization platforms (e.g., Power BI, Tableau, Looker)
Proven track record of reducing MTTR and improving system reliability through data-driven initiatives
United's Digital Technology team is comprised of many talented individuals all working together with cutting-edge technology to build the best airline in the history of aviation. Our team designs, develops and maintains massively scaling technology solutions brought to life with innovative architectures, data analytics, and digital solutions.
Job overview and responsibilities
The Director of Enterprise Site Reliability and Enablement is responsible for driving operational excellence through data-driven insights, real-time dashboarding, and reliability initiatives. This role ensures IT systems operate efficiently, with a strong focus on incident response, observability, and Mean Time to Resolve (MTTR) improvement. The ideal candidate will have a deep understanding of IT operations, data analytics, and performance monitoring tools to proactively enhance service reliability and decision-making.
Operational Leadership:
Provide clear executive decision-making and priority management; coach and build confidence in team to make good business decisions using technology & analytical thinking; proactively plan, communicate & mitigate risks across stakeholders
Achieve operational excellence and superior user experience advances by building a high-performing team to achieve and exceed goals and objectives
Drive continuous infrastructure, tools, and process improvement, working with cross-functional teams in support of campaigns/projects, analytics/reporting/business intelligence
Work seamlessly with other Digital Technology & business unit leaders to architect and build best in class solutions and experience
Operational Data Analytics & Dashboarding:
Develop and manage real-time dashboards to visualize IT performance, system health, and reliability metrics
Leverage data analytics to identify trends, detect anomalies, and drive continuous improvements in IT operations
Standardize reporting processes for IT operations KPIs, including MTTR, uptime, SLAs, and incident volume
Implement AI/ML-driven analytics to predict and prevent IT failures before they impact business operations
Reliability & Incident Management:
Lead initiatives to improve IT system reliability, reducing downtime and optimizing service performance
Drive MTTR improvement strategies by enhancing incident response processes, automation, and root cause analysis (RCA)
Implement observability solutions to provide end-to-end visibility across infrastructure, applications, and services
Collaborate with engineering and DevOps teams to optimize system performance and availability
IT Operations Strategy & Continuous Improvement:
Define and execute strategies for operational resilience, ensuring high availability and performance
Introduce process automation and AIOps solutions to enhance IT efficiency and reduce manual effort
Align IT operations with business objectives, ensuring proactive issue resolution and continuous service optimization
Collaboration & Leadership:
Work cross-functionally with infrastructure, DevOps, security, and business teams to drive operational excellence
Present data-driven insights to executive leadership, influencing IT strategy and decision-making
Foster a culture of accountability, innovation, and continuous learning within IT operations