Sign in

Cloud Site Reliability Engineer Manager (Remote USA)

Company:
#OpenToWork Careers
Location:
Fort Worth, TX
Posted:
July 22, 2021

Description:

Overview:

**Must have software deployment cloud experience.

**Must have experience managing a small team.

The Cloud Site Reliability Engineer Manager is critical to the success of the company and its customers. The SRE will fill the gap between Cloud OPs and the Product/Dev Teams focusing on improving the delivery of our products to the customers. This role will work closely with the product dev teams, participating in weekly sprint planning to provide support/consulting and advocate for the improvements needed to provide a world class hosting experience. The SRE will be responsible for building systems and tooling to enable and empower the dev teams to work more efficient while keeping a cloud-first mentality. This is an internal product-facing role that will work and collaborate closely with development teams, support teams, architects and peer engineers for planning, development, and implementation of solutions for various systems.

Responsibilities

• Drive Thought Leadership into the team driving towards the SRE Mission Statement

• Manage the Team’s weekly project work through status updates against deadlines

• Drive Monthly reports from each SRE for their BU’s uptime and reliability metrics

• Handle HR duties, Review and approve Time-Off, Coordinate Schedules, Perform Annual Reviews

• Design processes for improving operational stability of the Cloud.

• Identify, document and help improve performance and operational efficiency challenges

• Create tooling with documentation to scale the Cloud

• Validate and enforce best application security practices

• Participate in incident management on-call rotation and drive root cause analysis

• Collaborate with engineering teams, product owners and other stakeholders to develop tooling and CI/CD procedures

• Continual development of monitoring tools and best practices

• Help drive capacity requirements and planning

• Ability to function in a Dev\Ops atmosphere

• Support and manage cloud infrastructure and environments (AWS, Azure, IBM, Private Cloud)

• Ability to drive standardization across the team by building repeatable processes to ensure Cloud Stability

• Complies with security standards and technical design

• Complies with ITSM standards and practices

Qualities and Skills Required

• Bachelor’s Degree in Computer Science, Engineering, IS

• 5+ years’ experience in a 24 7 high-availability production Cloud environment

• Configuration management and automation tools such as Ansible, Terraform, vRA, etc

• Experience with CI/CD tools and implementing best practices

• Strongly prefer prior experience in Microsoft Windows (Server and Guest OS)

• Experience with virtualization technologies such as VMWare

• Experience with Active Directory, PowerShell, SQL, Microsoft Remote Desktop Services

• Experience with configuring and extending monitoring tools

• A background in automating the management of a data center environment

• Experience with cloud-based IAAS (AWS, IBM Cloud)

• Good understanding of Software Development Lifecycle

• Excellent analytical and problem-solving skills

• A passion for system stability, performance, scalability and customer success

• Ability to work with minimal supervision, making decisions based upon priorities, schedules and an understanding of business initiatives

• Strong interpersonal and team building skills

• The desire to take advantage of training and learning opportunities

Apply