Post Job Free
Sign in

Site Reliability Engineer

Company:
Hale Recruiting
Location:
New York, NY
Posted:
April 29, 2024
Apply

Description:

Summary - Site Reliablity Engineer (For one of the Big 4 Sports &Entertainment League)

Our client is enhancing the landscape of the live sports and entertainment

industry. They are striving to deliver innovative, cutting-edge technologies to enable safe,

unforgettable fan experiences across the globe. They are assembling a world-class technology team to build and support platforms and products that anticipate these emerging opportunities.

The Data(base) Reliability Engineer will join the infrastructure team while also working

alongside league team members and be responsible for the following areas:

Uptime, High Availability and Disaster recovery planning

Incident response

Optimization of data stores

Identify SLIs and define SLOs

Observability tooling

Debugging running systems and providing tools to assist runtime debugging

Optimizations for cost control

Ability to interface with all levels of employees

Ability to work both independently with little supervision and in a team environment

Ensures availability, security, integrity, and recovery of data, pipelines and data stores.

Define and configure relevant database metrics to ensure observability

Create and maintain dashboards and reports to visualize database performance and health

Create monitoring and alerting to trigger on error conditions, degradation symptoms and defined

SLOs, as well as outages

Develops and implements data store maintenance plans, including performing integrity checks,

Updating statistics and monitoring security and hardware resource utilization

Work with peers to roll out changes to production environments and help mitigate and prevent

Data-related production incidents

Work on automation of data store infrastructure and help engineering succeed by providing

self-service tools

Resolves performance, capacity, replication, and other distributed data, pipeline and data store issues

Support and debug data production issues across services and levels of the stack

Provide timely incident response and participate in on-call rotations

Continuously identify opportunities for process improvement and automation to enhance

database performance, reliability, and efficiency

Prioritize unblocking your teammates, collaboration and knowledge sharing

Qualifications:

To perform this job successfully, an individual must be able to perform the Duties and Responsibilities (Duties) above satisfactorily and meet the requirements below. The requirements listed below are representative of the minimum knowledge, skill, and/or ability required. Reasonable accommodations will be made to enable individuals with disabilities to perform the essential functions of the job.

Education and/or Experience: Required:

Minimum of a bachelor’s degree in Computer Science, MIS or related degree and five (5) years of relevant experience including software or reliability engineering, database administration, datastore programming experience or combination of education, training and experience.

Ability to communicate clearly and effectively strong opinions on how to use technologies such as cloud, microframeworks, DevOps, automation, and observability tools

Demonstrable experience engineering automation of triggers, alerts, and remediation

Have written code in a compiled language that runs in production somewhere

Experience in Oracle 19c, Postgres, Mongo, Change Data Capture, data and data store monitoring, management and support

Experience with OLTP, OLAP as well as PL/SQL code development and tuning

Experience in Linux OS and shell scripting

Extensive experience in performance tuning and analysis

Strong ITIL principles are a plus

Capacity planning for all aspects of a data store system (storage, compute, memory, etc.)

Understanding of networking and connectivity and how it relates to a data store environment

Excellent problem solving and troubleshooting skills

Ability to work non-standard shifts including nights and/or weekend on-call responsibilities

Dedicated to continuous improvement of yourself and our SRE/DBRE capabilities

Key Technical Traits

APIs and microservices: REST, Web, Graph

Database Solutions – Oracle, MYSQL, MSSQL, CloudSQL, NoSQL

Cloud Providers: Oracle Cloud Infrastructure, Google Cloud Platform, AWS

Real-time log/event monitoring – DataDog, Stackdriver, Oracle Enterprise Manager, Oracle Cloud

Monitoring, SolarWinds, Splunk, SumoLogic, OpenTelemetry

Scripting: PL/SQL, Shell

Secured Access and control – Okta SSO and MFA, MS Active Directory, DataSafe

Software Development tools – Jira, GIT, Jenkins, ArgoCD, Terraform

Compliance: PCI DSS, SSAE18/SOC 1

Apply