Site Reliability Engineering Manager

Company:

The Cypress Group

Location:

Manhattan, NY, 10261

Posted:

May 18, 2024

Apply

Description:

Job Title: Manager of Site Reliability Engineering

Company Overview:

Join a leading global payment processing firm revolutionizing the industry through cutting-edge technology and a commitment to excellence. We are dedicated to providing seamless, secure, and efficient payment solutions to clients worldwide. As we continue to expand and innovate, we are seeking a talented Manager of Site Reliability Engineering to lead our team and drive our technical operations forward.

Position Overview:

As the Manager of Site Reliability Engineering, you will play a pivotal role in ensuring the reliability, scalability, and performance of our high-transaction payment processing systems. Leading a team of three experienced engineers initially, with a focus on growing the team to up to ten members, you will oversee incident management, disaster recovery, and technical guidance for our evolving infrastructure. With our environment transitioning from a monolithic PHP architecture to a .NET microservices framework hosted on AWS (with openness to other cloud platforms), your expertise in cloud-native solutions and hands-on leadership will be instrumental in our success. This role offers the opportunity for remote work with a preference for candidates located on the East Coast to facilitate communication with our offshore teams.

Responsibilities:

Lead and mentor a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and continuous learning.

Oversee incident management processes, ensuring rapid resolution and minimal disruption to operations.

Develop and implement disaster recovery plans to safeguard against potential disruptions and minimize downtime.

Provide technical leadership and guidance in the transition from a monolithic PHP environment to a .NET microservices architecture.

Collaborate closely with development teams to optimize system performance, scalability, and reliability.

Evaluate and implement cloud-native solutions on AWS, driving efficiency and cost-effectiveness.

Conduct interviews and actively participate in the hiring process to build and grow a talented engineering team.

Continuously monitor system health, performance metrics, and security protocols, implementing improvements as needed.

Stay abreast of industry trends, emerging technologies, and best practices in Site Reliability Engineering.

Requirements:

Bachelor's degree in Computer Science, Engineering, or related field; advanced degree preferred.

Proven experience in a leadership role within Site Reliability Engineering, preferably in a high-transaction environment.

Extensive hands-on experience with cloud platforms, particularly AWS; certification(s) a plus.

Strong proficiency in .NET development and microservices architecture; familiarity with PHP advantageous.

Expertise in incident management, disaster recovery planning, and technical troubleshooting.

Excellent communication skills with the ability to effectively collaborate with cross-functional teams and communicate with offshore teams.

Demonstrated experience in hiring, mentoring, and developing engineering talent.

A passion for innovation, problem-solving, and driving continuous improvement.

Highly organized, detail-oriented, and able to thrive in a fast-paced, dynamic environment.

Location: Remote (East Coast highly preferred)

Compensation: Up to $245,000 annually + bonus

Apply

Site Reliability Engineering Manager

Description:

Report this job