Post Job Free
Sign in

Senior Site Reliability Engineer

Company:
CARFAX
Location:
Columbia, MO
Posted:
April 30, 2025
Apply

Description:

Join Team CARFAX as a Senior Site Reliability Engineer!

Isn't it time you bragged about where you work? At CARFAX, we do, every day. We pride ourselves on being mission-focused on helping to grow a brand built on accuracy and integrity. We care deeply about our products and our customers. We’re more than just a company: We help millions of consumers make more informed decisions every day. We know that our teammates are our most valuable asset, and we value a balanced life while tackling challenging projects in a fast-paced environment. One last thing: Our four-day week continues in Summer 2025!

This role has an expectation of 3 days in the Columbia, MO office per week, subject to change based on future business needs.

What you'll be doing:

Support DevOps at CARFAX as an engineer in our observability practice.

Maintain the observability tool stack used by teams throughout CARFAX.

Work in a dynamic, agile, team environment helping keep CARFAX’s applications up and running.

Collaborate with engineering teams to design and build monitoring solutions

Respond to major incidents. Help teams troubleshoot their products and restore service.

Collaborate closely with DevOps and engineering teams to implement observability best practices.

Reduce toil by creating observability automation that can be reused across our teams.

Continuously analyze and evaluate our systems, products, and process for potential improvements

What we're looking for:

Five or more years of experience with observability solutions.

Experience with the following:

Maintaining cloud infrastructure via IaC - Terraform preferred

AWS EKS and monitoring solutions for K8s.

Prometheus and Grafana to collect and visualize metrics.

Platforms such as New Relic, DataDog or Splunk to collect metric and event data.

Log management: experience operating and managing a large scale ELK track.

Monitoring and alerting: experience analyzing applications and infrastructure and determining the right type of monitoring and alerting

Experience with our tech stack: AWS (EKS), Prometheus / Grafana, Terraform / Consul / Vault, NodeJS / GoLang, Java.

Strong believer in reducing toil for yourself and teammates.

Ability to troubleshoot complex systems and help resolve major incidents.

Strong communciation skills for documenting best practices to be implemented.

What’s in it for you:

Competitive compensation, benefits and generous time-off policies

4-Day summer work weeks and a winter holiday break

401(k)/DCPP matching

Annual bonus program

Casual, dog-friendly, and innovative office spaces

Don’t just take our word for it:

10X Virginia Business Best Places to Work

9X Washingtonian Great Places to Work

9X Washington Post Top Workplace

St. Louis Post-Dispatch Best Places to Work

Apply