Job Title: Observability, Monitoring, DevOps, or Site Reliability Engineer
Job Location: Boston MA 02110 (Remote)
Onsite Requirements:
SolarWinds Orion
Prometheus
Grafana, Telegraf, Alertmanager
Job Description:
Client is migrating its existing Solarwinds Orion monitoring devices to Prometheus.
Both Solarwinds and Prometheus provide real time monitoring of servers and VMs.
The environment to be monitored is Windows servers (2019 and up) and VMWare.
Responsibilities:
Prometheus is an open source systems monitoring and alerting toolkit designed for reliability and scalability.
Client Prometheus stack consists of:
Grafana - tool for creating, exploring, sharing dashboards using metrics captured by Prometheus devices.
Alert manager - Alerts in Prometheus are sent to Alert manager which silences, inhibits, aggregates, sends out notifications regarding alerts.
Telegraf - A tool that collects metrics from Prometheus.
Must Haves:
5+ years of professional, IT experience
At least 2 years' hands-on professional experience working with/administering Prometheus stack (Prometheus, Alert manager, Grafana, Telegraf)
Experience working with Prometheus on a Windows server (2019 and above) and VMware environments
This person needs to be able to work independently and with limited oversight
Ideally people with this skillset will come out of larger, most robust Windows/VM environments
Big Plusses:
Experience with Solarwinds Orion
Any experience deploying Prometheus
3rd party and subcontract staffing agencies are not eligible for partnership on this position. 3rd party subcontractors need not apply.
This position requires candidates to be eligible to work in the United States, directly for an employer, without sponsorship now or anytime in the future.