Zachary Piper Solutions is seeking a Site Reliability Engineer- Big Data responsible for building and managing a Data Platform enabling the creation of large-scale, high-throughput data products and services delivering actionable operational and business intelligence This position is hybrid two days a week onsite in Reston, VA.
**Candidate must not require any work authorization** Key Responsibilities: Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact Troubleshooting complex issues in large and distributed environments.
Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments Supporting data platform customers Participating in the on-call rotation monitoring production systems and responding to incidents Requirements: Candidate must not require any work authorization Bachelor's degree in computer science or a related technical field, or equivalent combination of education and experience 5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid) Excellent understanding of Linux configuration and administration Strong automation experience - Not just developing automation, but knowing why we automate and what to automate Strong understanding of infrastructure-as-code such as Ansible Experience with Docker or Kubernetes in a production environment Strong written and verbal communication skills - able to clearly and succinctly describe complex issues.
Compensation: $140,000-$150,000/year **depending on years of experience and degree** Full Benefits -Medical, Dental, Vision, 401K, Paid Holidays, PTO, Sick Leave if required by law This job opens for applications on 12/4/2025.
Applications for this job will be accepted for at least 30 days from the posting date #LI-Onsite #LI-GC2 Keywords: Site Reliability Engineer, SRE, Big Data, Data Platform, Hybrid Cloud, Operational Intelligence, Business Intelligence, High-throughput Data Products, Distributed Systems, Kafka, Spark, Hadoop, Druid, Kubernetes, Docker, Linux Administration, Cluster Provisioning, CI/CD, Ansible, Python, Jenkins, Infrastructure-as-Code, Telemetry, Monitoring, Automation, Upgrades & Security, Troubleshooting, Open-Source Integration, Data Platform Management, Containerization, Configuration Management, Visualization & Alerting, On-call Rotation, Production Systems Monitoring, DevOps, Linux, automation, design, automate, large-scale, ideation, implementation, deployment, customer onboarding, support, cross-team collaboration, Data Engineering, Infrastructure, Engineering, Security, Operation Teams.