Post Job Free
Sign in

Prdouction support Lead

Company:
RIT Solutions, Inc.
Location:
Summit, NJ, 07902
Posted:
December 03, 2025
Apply

Description:

Job title: production support Lead

Location: Newark, NJ

Onsite: 3 days a week

interview process: 2 video interviews to hire

Contract: 12 months to perm

Must have:

Communication is extremely important here

Lead experience

Build and design

COE

SLA

Salesforce or cloud experience

Job Description:

We are seeking an experienced Production Support Lead to manage and enhance the stability, performance and availability of our client facing applications. This role requires a proactive leader who can guide a dedicated support team, collaborate with engineering teams, and effectively manage incidents to minimize downtime, improve user experience and communicate with stakeholders.

Key Responsibilities:

Incident Management and Resolution:

- Oversee the triage, investigation and resolution of production issues, ensuring timely communication and status updates

- Manage incident response efforts, including documentation and root cause analysis and post-incident reviews to identify preventative actions

- Establish clear escalation protocols and ensure adherence to serve level agreements (SLAs)

- Coordinate resolution and follow ups with dependencies outside immediate team

- Coordinate KTs between development teams and L1/L2 triage to establish runbooks and knowledge base

Team Leadership and Coordination:

- Coordinate with development, QA, and infrastructure teams to ensure seamless issue resolution and knowledge sharing

- Foster a strong ownership mindset within the team, ensuring accountability for system health and stability

Monitoring and Alerting

- Define and maintain effective monitoring solutions in partnership with development teams to proactively identify and address potential issues

- Continuously improve observability by implementing dashboards, alerts and automated health checks in partnership with development teams

Process and Documentation

- Develop and maintain detailed runbooks, SOPs and knowledge base articles to ensure consistent response procedures

- Establish best practices for incident response, including communication templates and decision frameworks

Stakeholder Communication:

- Serve as the primary point of contact for production issues affecting client experiences

- Provide clear, concise updates to leadership, internal teams and clients during incidents and post-incident reviews.

Continuous Improvement

- Identify patterns in recurring incidents and partner with development teams to implement permanent fixes

- Drive initiatives to enhance system reliability, scalability, and performance.

Qualifications and Skills:

- Proven experience in a production support leadership role for client facing applications

- Strong understanding of incident management frameworks

- Proficiency in troubleshooting application, database, and infrastructure issues

- Familiarity with monitoring tools such Dynatrace, Datadog, Splunk etc

- Familiarity with incident management platforms such as ServiceNow

- bility to prioritize tasks effectively, and communicate technical concepts to non technical stakehodlers

- Excellent problem solving skills and a calm, solution-focused approach under pressure

- Experience working in AWS

- Familiarity with CI/CD pipelines and release management processes

Preferred:

- Background in software development or scripting for automation

- Previous experience in the financial services industry

Success Metrics

- MTTA: Mean time to acknowledge

- MTTR: Mean time to resolve

- Stakeholder satisfaction with incident communication

- Knowledge base usage rate and coverage

- Number of issues handed over to L1/L2, EMKT teams

- Measure # of system identified vs user reported alerts and trends over time

- Enhancements and alerts requested

- Minimize # of user reported incidents

- Measure incidents resolved with L1/L2 without app support team

Apply