Company Overview
We are a Hudson Oaks, Texas-based Internet Service Provider (ISP) delivering High Speed Internet and Voice
Services throughout multiple states to residential, business, K-12 Education and government customers. We
believe there is much more to an internet company than just delivering cost-effective internet solutions; we
believe in delivering an overall customer experience that our competitors simply cannot match.
Job Summary:
Design, build, and operate Nextlink's CI/CD, GitOps, container, and infrastructure-as-code platforms across
on-prem datacenters and public cloud. Partner with Engineering, Field, NOC, and Security to automate
workflows, improve reliability, and accelerate delivery for customer-impacting services. This role also serves
as a subject matter expert for network and infrastructure devices-owning monitoring device models, standards,
and pre-production testing aligned to the Nextlink launchpad process. Current stack includes GitHub/GitLab
CI, Terraform/Ansible, Docker/Kubernetes, and Grafana/Prometheus/ELK.
Responsibilities:
Reasonable accommodations will be made to enable individuals with disabilities to perform the essential
functions.
Monitoring Device Models & Telemetry:
• Develop, test, and maintain up-to-date device models for Nextlink monitoring systems (SNMP, API,
streaming telemetry).
• Collaborate with Engineering, Field, and NOC to ensure correct monitoring, data collection, thresholds,
and alert standards.
Network Monitoring Systems (NMS) - Zabbix (Design & Architecture):
• Own platform architecture for Zabbix (server, proxies, and backing database-e.g., PostgreSQL with
time-series extension) including HA/failover, housekeeping, and retention policies.
• Create and maintain device templates (SNMPv3, API, JMX/IPMI/SSH as applicable) with low-level
discovery (LLD), item/trigger prototypes, macros, preprocessing, and escalation logic that match
Nextlink standards.
• Distributed monitoring at scale: design proxy placement and discovery to cover POPs/datacenters and
edge sites; ensure secure comms (TLS, PSKs/certs) and reliable buffering.
• Alert quality & noise reduction: implement trigger dependencies, event correlation, maintenance
windows, and SLA/service maps; tune thresholds from SLOs and NOC feedback.
• Automation & "Zabbix-as-Code": manage templates, host onboarding, actions, and maintenance via the
Zabbix API and Git-based workflows; integrate with CI/CD to promote monitoring changes through
environments.
• Integrations: connect Zabbix to ChatOps (Teams/Slack), ticketing, and paging; publish dashboards for
NOC/leadership; export metrics/events to your observability stack where useful.
• Security: enforce RBAC, SNMPv3, secret rotation, and least-privilege API tokens; document and test
upgrades and rollbacks for zero/minimal downtime.
Automation & Network Change:
• Identify, develop, and maintain scripts/tools to automate processes and network/device changes (Python,
Bash, PowerShell).
• Enforce configuration baselines, drift detection, and golden-config rollouts; integrate change control and
approvals.
CI/CD, GitOps & Release Engineering:
• Build and maintain CI/CD pipelines (reusable templates, quality gates, artifact/versioning, blue/green &
canary).
• Implement GitOps for Kubernetes and network automation (Argo CD/Flux) using Helm/Kustomize and
policy controls.
• Support ephemeral environments, infrastructure testing, and progressive delivery with feature flags as
applicable.
Infrastructure as Code, Datacenter & Cloud:
• Plan, deploy, and maintain physical servers and datacenter assets (capacity, ordering, lifecycle,
firmware).
• Provision cloud resources (Azure, AWS, GCP) using Terraform with least-privilege identities and
tagging/FinOps standards.
• Implement secure networking (VNet/VPC, private endpoints, peering, DNS/TLS, load balancing,
WAF).
Observability & SRE:
• Own metrics, logs, traces, and profiling via Prometheus/Grafana, and ELK; leverage eBPF where
appropriate.
• Define SLIs/SLOs, manage error budgets, and lead incident response/post-incident reviews alongside
the NOC.
Security, Compliance & Supply Chain:
• Embed DevSecOps: secret rotation, workload identity federation (OIDC), and least privilege across
platforms.
• Establish software supply-chain controls: SBOM (CycloneDX), image signing (Sigstore cosign),
provenance (SLSA), and policy-as-code (OPA/Kyverno).
• Automate vulnerability management, patching, and CIS/NIST-aligned hardening.
AIOps & ChatOps:
• Integrate AIOps for anomaly detection, noise reduction, and incident summarization; apply LLMs to
enhance runbooks and root-cause hypotheses.
• Implement ChatOps for deployments, rollbacks, and diagnostics via Teams/Slack bots with guardrails.
Standards, Testing & Documentation:
• Create standards for device configuration and proper use; test/qualify new devices before production per
Nextlink LaunchPad.
• Document architectures, runbooks, and SOPs; provide training to stakeholders and track/report on
projects.
Technical Skills:
• Bachelor's degree in CS/IT/Engineering or equivalent experience.
• Experience designing and operating Zabbix at scale (server, proxies, HA, PostgreSQL/time-series),
building templates/LLD with macros and trigger logic, and automating changes via the Zabbix API.
• 3+ years in DevOps/SRE/Platform Engineering supporting production systems.
• Strong coding/scripting in Python and one additional language (e.g., Bash or PowerShell).
• Hands-on with CI/CD (GitHub/GitLab), IaC (Terraform), and configuration management (Ansible).
• Proficient with Linux, containers (Docker), and Kubernetes (cluster operations, Helm/Kustomize,
GitOps).
• Solid networking fundamentals and operations in ISP contexts (routing basics, SNMP, NetFlow/sFlow).
• Experience with observability stacks (Prometheus/Grafana, ELK) and incident response.
Preferred Qualifications:
• Proven Zabbix architecture work (multi-proxy, distributed sites), API-driven onboarding, and
integrations with ChatOps/ticketing.
• Experience with AKS/EKS/GKE and GitOps controllers (Argo CD/Flux).
• Knowledge of zero-trust patterns, workload identity with OIDC, and secrets management (Key Vault,
Vault).
• Familiarity with SBOMs, SLSA, image signing, and policy-as-code frameworks (OPA/Kyverno,
Conftest).
• Exposure to AIOps tooling and ChatOps automation.
• Understanding of network automation APIs and telemetry for vendor devices common to ISPs.
Work Environment/Hazards:
• Working conditions primarily inside an office environment
• The noise level in the work environment is usually moderate
• High level of interaction with external/internal clients
Working Hours/Days:
• Full-time
• Standard business hours with flexibility as needed
• Occasional on-call responsibilities for critical projects
Travel Requirements:
• Able to travel 0-10% of the time
Affirmative Action (AAP/EEO Statement):
Nextlink is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment
without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected
veteran status and will not be discriminated against based on disability.
Drug Free Workplace:
Nextlink intends to provide a safe work environment that will help protect the safety, health and well-being of all
employees. Therefore, we are committed to an alcohol and drug-free workplace.