Hiring: Senior Site Reliability Engineer (SRE) – W2 Only Onsite – Columbus, OH
Role Overview
We are looking for a Senior Site Reliability Engineer (SRE) who will be responsible for ensuring the stability, performance, and scalability of large-scale enterprise systems in production environments. This role focuses heavily on keeping critical platforms healthy, highly available, and resilient while minimizing downtime and performance issues.
The engineer will work closely with development, infrastructure, security, and networking teams to support end-to-end production systems. A key part of the role involves monitoring system behavior, identifying issues before they impact users, and driving quick resolution during incidents.
You will be expected to build and maintain monitoring dashboards, alerts, and observability solutions using tools like Splunk, ELK, Grafana, Prometheus, and similar APM platforms. The role also involves analyzing logs, metrics, and system patterns to identify bottlenecks and improve overall platform performance.
In addition, the candidate will contribute to automation, CI/CD processes, and infrastructure reliability improvements, ensuring systems are efficient, scalable, and easier to maintain. Strong hands-on experience with distributed systems, databases, and cloud or container platforms is essential.
Requirements:
Strong knowledge of distributed systems, algorithms, relational & NoSQL databases
Experience with Kafka, MQ, Redis, Memcache
Hands-on with APM & Observability tools:
Splunk, ELK, Grafana, Prometheus, GCL
Ability to build dashboards, alerts, and monitoring systems
Strong experience with Java/J2EE (Spring, Spring Boot), Python, Shell Scripting
Experience with Oracle, MongoDB