Engineered to outperform, Teraswitch is on a mission to provide high-performance infrastructure services for critical workloads. With 20+ datacenter locations around the world interconnected by our low latency global backbone network, we are the class leader in performance bare metal hosting and rapidly expanding into additional infrastructure services.
The Job
The Infrastructure Engineering team at Teraswitch is responsible for the compute, storage, and platform infrastructure that powers our products and internal operations.
This senior/staff-level role will architect and lead our global, self-hosted Kubernetes deployment and help drive our cloud-native approach to both internal and customer-facing services. You’ll design for a self-hosted (bare metal) environment, without relying on cloud-managed control planes, load balancers, or databases. This role will also build reusable platform capabilities and operating models to facilitate Kubernetes adoption by other teams, and help drive and support adoption of cloud-native across the organization.
While this role has a Kubernetes / platform focus, as a senior member of the Infrastructure Engineering team, you’ll also be expected to cross-train and contribute broadly across infrastructure domains as we grow the team.
What You’ll Do
Architect and lead our globally distributed, self-hosted Kubernetes deployment (including provisioning and management, multi-site app deployments, HA/DR strategy, failure domains, etc)
Define and implement our Kubernetes storage and networking strategies
Define and implement our Kubernetes security posture: secure network policies, RBAC, container security, secrets management, vulnerability management, compliance-oriented controls and reporting
Drive modern, cloud-native observability/monitoring for the platform and workloads
Deliver platform capabilities for developers: namespaces/tenancy model, “paved road” patterns and templates, standard ingress/certs/secrets approaches, documentation
Collaborate with and support the Software team and other internal stakeholders on cloud-native deployments, acting as an internal cloud-native SME
Cross-train with the rest of the Infrastructure Engineering team and contribute broadly to the compute, storage, and platform infrastructure that powers Teraswitch products and internal operations
Basic Qualifications
Strong experience operating production Kubernetes, including cluster lifecycle responsibilities (e.g. provisioning, management, upgrades, observability, troubleshooting, storage, networking)
Experience in self-hosted Kubernetes architectures (i.e. without relying on cloud-managed control planes and managed apps)
Experience with internal platform capabilities (GitOps/CI/CD integration, paved roads, developer enablement)
Experience with cloud-native observability/monitoring (metrics, logs, traces, alerting)
Strong Linux systems and networking expertise
Comfortable working in a fast-paced, results-oriented environment
Committed to operational best practices and security by design
Preferred Skills/Experience
You do not need all of these—depth in a few areas plus strong fundamentals is sufficient:
Experience with multi-cluster, multi-region Kubernetes management (including app deployments and HA/DR strategy)
Deep Kubernetes storage knowledge; hands-on experience managing and integrating persistent software-defined cluster storage (e.g. Longhorn, Ceph, VAST, etc)
Deep Kubernetes networking knowledge; hands-on experience with advanced cluster networking (e.g. BGP and other mechanisms for workload HA)
Solid understanding of and experience implementing Kubernetes security best practices (secure network policies / workload security, RBAC, secrets management, vulnerability management, compliance-oriented controls/reporting)
Cloud-native database self-hosting / management experience (MySQL, Postgres) - for example, using tools like CloudNativePG or Vitess
Experience with KubeVirt and/or other VM-on-Kubernetes deployments
Production-grade, cloud-native observability design (metrics/logs/traces correlation, OpenTelemetry pipelines, Prometheus/Grafana).
Service / hosting provider experience (multi-tenant systems, automation-first operations, scalable and secure design)
Automation experience - scripting (Python, bash, etc) and/or configuration management (Ansible, etc)
Experience with CI/CD and/or GitOps deployment models and workflows
Experience in other Infrastructure team domains - e.g. distributed storage systems (block or object storage services), KVM-based virtualization (cloud services), and/or bare metal automation / fleet management
On-Call / Operations
Participate in an on-call system supporting critical production systems.
Location
Preference given to full-time onsite candidates in Pittsburgh, PA, followed by hybrid candidates.
Compensation and Benefits
Along with a competitive pay scale, full-time Teraswitch employees are eligible for the following benefits:
Health, Dental, and Vision Insurance
401(k) with company profit sharing
PTO and 11 Company Paid Holidays