Post Job Free
Sign in

Development Operations Engineer

Company:
Nextlink
Location:
Hudson Oaks, TX
Posted:
February 18, 2026
Apply

Description:

Company Overview

We are a Hudson Oaks, Texas-based Internet Service Provider (ISP) delivering High Speed Internet and Voice

Services throughout multiple states to residential, business, K-12 Education and government customers. We

believe there is much more to an internet company than just delivering cost-effective internet solutions; we

believe in delivering an overall customer experience that our competitors simply cannot match.

Job Summary:

Design, build, and operate Nextlink's CI/CD, GitOps, container, and infrastructure-as-code platforms across

on-prem datacenters and public cloud. Partner with Engineering, Field, NOC, and Security to automate

workflows, improve reliability, and accelerate delivery for customer-impacting services. This role also serves

as a subject matter expert for network and infrastructure devices-owning monitoring device models, standards,

and pre-production testing aligned to the Nextlink launchpad process. Current stack includes GitHub/GitLab

CI, Terraform/Ansible, Docker/Kubernetes, and Grafana/Prometheus/ELK.

Responsibilities:

Reasonable accommodations will be made to enable individuals with disabilities to perform the essential

functions.

Monitoring Device Models & Telemetry:

• Develop, test, and maintain up-to-date device models for Nextlink monitoring systems (SNMP, API,

streaming telemetry).

• Collaborate with Engineering, Field, and NOC to ensure correct monitoring, data collection, thresholds,

and alert standards.

Network Monitoring Systems (NMS) - Zabbix (Design & Architecture):

• Own platform architecture for Zabbix (server, proxies, and backing database-e.g., PostgreSQL with

time-series extension) including HA/failover, housekeeping, and retention policies.

• Create and maintain device templates (SNMPv3, API, JMX/IPMI/SSH as applicable) with low-level

discovery (LLD), item/trigger prototypes, macros, preprocessing, and escalation logic that match

Nextlink standards.

• Distributed monitoring at scale: design proxy placement and discovery to cover POPs/datacenters and

edge sites; ensure secure comms (TLS, PSKs/certs) and reliable buffering.

• Alert quality & noise reduction: implement trigger dependencies, event correlation, maintenance

windows, and SLA/service maps; tune thresholds from SLOs and NOC feedback.

• Automation & "Zabbix-as-Code": manage templates, host onboarding, actions, and maintenance via the

Zabbix API and Git-based workflows; integrate with CI/CD to promote monitoring changes through

environments.

• Integrations: connect Zabbix to ChatOps (Teams/Slack), ticketing, and paging; publish dashboards for

NOC/leadership; export metrics/events to your observability stack where useful.

• Security: enforce RBAC, SNMPv3, secret rotation, and least-privilege API tokens; document and test

upgrades and rollbacks for zero/minimal downtime.

Automation & Network Change:

• Identify, develop, and maintain scripts/tools to automate processes and network/device changes (Python,

Bash, PowerShell).

• Enforce configuration baselines, drift detection, and golden-config rollouts; integrate change control and

approvals.

CI/CD, GitOps & Release Engineering:

• Build and maintain CI/CD pipelines (reusable templates, quality gates, artifact/versioning, blue/green &

canary).

• Implement GitOps for Kubernetes and network automation (Argo CD/Flux) using Helm/Kustomize and

policy controls.

• Support ephemeral environments, infrastructure testing, and progressive delivery with feature flags as

applicable.

Infrastructure as Code, Datacenter & Cloud:

• Plan, deploy, and maintain physical servers and datacenter assets (capacity, ordering, lifecycle,

firmware).

• Provision cloud resources (Azure, AWS, GCP) using Terraform with least-privilege identities and

tagging/FinOps standards.

• Implement secure networking (VNet/VPC, private endpoints, peering, DNS/TLS, load balancing,

WAF).

Observability & SRE:

• Own metrics, logs, traces, and profiling via Prometheus/Grafana, and ELK; leverage eBPF where

appropriate.

• Define SLIs/SLOs, manage error budgets, and lead incident response/post-incident reviews alongside

the NOC.

Security, Compliance & Supply Chain:

• Embed DevSecOps: secret rotation, workload identity federation (OIDC), and least privilege across

platforms.

• Establish software supply-chain controls: SBOM (CycloneDX), image signing (Sigstore cosign),

provenance (SLSA), and policy-as-code (OPA/Kyverno).

• Automate vulnerability management, patching, and CIS/NIST-aligned hardening.

AIOps & ChatOps:

• Integrate AIOps for anomaly detection, noise reduction, and incident summarization; apply LLMs to

enhance runbooks and root-cause hypotheses.

• Implement ChatOps for deployments, rollbacks, and diagnostics via Teams/Slack bots with guardrails.

Standards, Testing & Documentation:

• Create standards for device configuration and proper use; test/qualify new devices before production per

Nextlink LaunchPad.

• Document architectures, runbooks, and SOPs; provide training to stakeholders and track/report on

projects.

Technical Skills:

• Bachelor's degree in CS/IT/Engineering or equivalent experience.

• Experience designing and operating Zabbix at scale (server, proxies, HA, PostgreSQL/time-series),

building templates/LLD with macros and trigger logic, and automating changes via the Zabbix API.

• 3+ years in DevOps/SRE/Platform Engineering supporting production systems.

• Strong coding/scripting in Python and one additional language (e.g., Bash or PowerShell).

• Hands-on with CI/CD (GitHub/GitLab), IaC (Terraform), and configuration management (Ansible).

• Proficient with Linux, containers (Docker), and Kubernetes (cluster operations, Helm/Kustomize,

GitOps).

• Solid networking fundamentals and operations in ISP contexts (routing basics, SNMP, NetFlow/sFlow).

• Experience with observability stacks (Prometheus/Grafana, ELK) and incident response.

Preferred Qualifications:

• Proven Zabbix architecture work (multi-proxy, distributed sites), API-driven onboarding, and

integrations with ChatOps/ticketing.

• Experience with AKS/EKS/GKE and GitOps controllers (Argo CD/Flux).

• Knowledge of zero-trust patterns, workload identity with OIDC, and secrets management (Key Vault,

Vault).

• Familiarity with SBOMs, SLSA, image signing, and policy-as-code frameworks (OPA/Kyverno,

Conftest).

• Exposure to AIOps tooling and ChatOps automation.

• Understanding of network automation APIs and telemetry for vendor devices common to ISPs.

Work Environment/Hazards:

• Working conditions primarily inside an office environment

• The noise level in the work environment is usually moderate

• High level of interaction with external/internal clients

Working Hours/Days:

• Full-time

• Standard business hours with flexibility as needed

• Occasional on-call responsibilities for critical projects

Travel Requirements:

• Able to travel 0-10% of the time

Affirmative Action (AAP/EEO Statement):

Nextlink is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment

without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected

veteran status and will not be discriminated against based on disability.

Drug Free Workplace:

Nextlink intends to provide a safe work environment that will help protect the safety, health and well-being of all

employees. Therefore, we are committed to an alcohol and drug-free workplace.

Apply