Site Reliability Engineer - Big Data

Company:

Verisign

Location:

Reston, VA

Posted:

May 11, 2025

Apply

Description:

Verisign helps enable the security, stability, and resiliency of the internet. We are a trusted provider of internet infrastructure services for the networked world and deliver unmatched performance in domain name system (DNS) services.

We are a mission focused, values driven company where each individual can contribute to building a stronger, more secure internet. We offer a dynamic and flexible work environment with competitive benefits and the ability to grow your career.

Within Verisign, our team is responsible for building and managing Verisign Data Platform enabling the creation of large-scale, high-throughput (millions requests per second) data products and services delivering actionable operational and business intelligence. To help us advance the platform, we are looking for a highly skilled Mid-level Site Reliability Engineer (SRE). This role will play a critical part in ensuring the stability, performance, and security of our data platforms

An ideal candidate should deeply care about big data systems and automation, be fluent in Infrastructure-as-Code, CI/CD, and be eager to learn as needed. The successful candidate should have an understanding of fundamentals, including core Computer Science concepts, operating systems, networking, file systems and databases accompanied by hands-on experience managing large-scale distributed systems. Acquiring these competencies typically requires an equivalent of a bachelor's degree and 6 or more years of practical work experience. We are also open to other career paths.

The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.

Key Responsibilities:

Architect, Design, deploy, monitor, and operate large scale data platforms like Hadoop, Kafka, Spark and Druid running both on physical servers and on top of Kubernetes

Participate in technical designs, Proof of Concepts for software solutions that combine Open-Source components, COTS (commercial off the shelf) components, and custom developed components

Deploy and manage Production releases with minimum supervision

Automate cluster provisioning (CI/CD, Infrastructure-as-Code), scaling, and monitoring using Ansible, Python, Jenkins, Terraform and other relevant tools

Build and deploy containerized applications using Docker and Kubernetes

Troubleshooting complex issues in large and distributed environments

Upgrading (including patching, deploying releases) large-scale data platforms improving system capabilities and security while ensuring minimal customer impact

Performance of occasional operations support functions, including problem isolation and resolution

Participate in the on-call rotation to monitor the health of the production systems and respond to incidents or customer needs

Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry

Supporting data platform customers and continuously improving the monitoring, performance, and functionality of the clusters

Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments The candidate must have:

Bachelor's degree in computer science or a related technical field, or equivalent combination of education and experience

5+ years of experience managing big data platforms (Hadoop, Spark Kafka, Druid)

Excellent understanding of Linux configuration and administration

Strong automation experience - Not just developing automation, but knowing why we automate and what to automate

Strong understanding of infrastructure-as-code

Strong written and verbal communication skills - able to clearly and succinctly describe complex issues

Familiarity with networking protocols and systems Desired Skills, Experience, and Attributes:

Experience with a high-level scripting language such as Python

Experience with RedHat Enterprise Linux and/or FreeBSD

Experience with network troubleshooting using such tools as ping, traceroute and dig

Deployment automation experience using tools such as Ansible

Experience working with teams using Kanban and/or Scrum a plus

Experience with Docker or Kubernetes in a production environment

Experience with OpenStack in a production environment

Experience administrating Unix systems in a large-scale environment

Experience using Jenkins in a continuous delivery and integration environment

This position is based in our Reston, VA office

The pay range is $108,900 - $147,300.

The anticipated annual base salary range for this position is noted above, however, base pay offered may vary depending on job-related knowledge, skills, experience. Verisign offers a discretionary bonus which is based on individual and company performance, and certain roles may be eligible for discretionary stock awards.

Verisign is an equal opportunity employer. That means we recruit, hire, compensate, train, promote, transfer, and administer all terms and conditions of employment without regard to their race, color, religion, national origin, sex, sexual orientation, gender identity, age, protected veteran status, disability, or other protected categories under applicable law.

Additional Information:

Our Careers Page

Our Benefits Summary

Verisign in the Community

Our EEO Statement

Our Privacy Notice for Job Applicants/Candidates

Reasonable Accommodations

Staffing agency policy: No fees will be paid for unsolicited resumes submitted to Verisign or our employees by third parties.

Apply

Site Reliability Engineer - Big Data

Description:

Report this job