Hybrid Contract-to-Hire $200K+ upon conversion On-call 24/7
We're seeking an experienced Outage Incident Manager to take charge during critical IT service disruptions. This role is responsible for leading high-impact incident response efforts, driving service restoration, and ensuring clear communication across business and technical teams. The ideal candidate thrives under pressure, communicates with precision, and continuously improves operational processes to minimize downtime and protect business continuity.
What You'll Do
Major Incident Leadership
Own and coordinate the full lifecycle of major incidents from detection to resolution.
Lead cross-functional response teams to troubleshoot, escalate, and resolve critical outages.
Serve as the command center during disruptions-organizing incident bridges, setting priorities, and keeping everyone aligned.
Ensure adherence to SLAs and internal response protocols.
Communication & Coordination
Act as the primary communication point for all outage-related updates, keeping stakeholders and executives informed in real time.
Run incident calls and postmortem reviews to ensure full transparency and follow-through.
Deliver post-incident summaries including root cause, business impact, and action plans.
Process & Continuous Improvement
Analyze incidents to identify recurring issues and areas for improvement.
Collaborate with Problem and Change Management teams to reduce recurrence and risk.
Recommend and implement enhancements to monitoring, escalation, and communication workflows.
Operational Readiness
Maintain a constant state of readiness through regular training, tabletop exercises, and tool optimization.
Measure performance through KPIs, trend reports, and effectiveness metrics.
Partner with IT Operations and engineering teams to strengthen overall reliability.What You Bring
Bachelor's degree in IT, Computer Science, or equivalent hands-on experience.
5+ years in IT operations, infrastructure, or service management.
At least 2 years of direct experience handling major incident or outage response.
Solid understanding of ITIL processes and service delivery best practices.
Strong leadership presence, capable of staying calm and decisive in high-pressure situations.
Excellent verbal and written communication skills for both technical and executive audiences.
Experience with tools such as ServiceNow, PagerDuty, Jira Service Management, or similar.
Familiarity with cloud platforms (AWS, Azure, GCP) and observability tools preferred.Preferred
On-call rotation experience (after-hours/weekend support).
Certifications such as ITIL v4, PMP, or Cloud Practitioner (AWS/Azure).