Lead DevOps/MLOps Engineer

Company:

Razor Talent

Location:

Reston, VA, 20190

Posted:

May 08, 2026

Apply

Description:

We're looking for a strong DevOps engineer who can help scale and operationalize our infrastructure as the platform grows. This is not a pure platform-architecture role — the focus is CI/CD, infrastructure automation, deployment reliability, observability, and GPU-oriented workload scaling.

What You'll Own

Improve CI/CD pipelines, deployment workflows, and release reliability

Standardize infrastructure and deployment patterns across environments

Improve observability through logging, metrics, tracing, dashboards, and rollout monitoring

Partner closely with backend engineering on:

deployment strategies

infrastructure automation

environment consistency

migration workflows

possible Kubernetes migration efforts

Support ML-oriented infrastructure as a secondary responsibility:

SageMaker workloads

Ray clusters

GPU scaling patterns

distributed batch execution

autoscaling behavior

runtime/image management

artifact delivery/versioning The Kind of Problems You'll Work On

Deployment safety and rollback strategies

Infrastructure consistency across environments

Release automation and environment promotion flows

Autoscaling and runtime stability

GPU workload orchestration and scaling efficiency

Operational tooling that reduces friction for engineering teams Stack

AWS

Terraform

Docker

Kubernetes

CI/CD systems

SageMaker

Ray

GPU compute infrastructure You'll Probably Do Well Here If

You've operated production infrastructure at meaningful scale

You're strong in practical DevOps execution and operational reliability

You care about automation, observability, and deployment safety

You're comfortable improving developer workflows and infrastructure tooling

You've worked with distributed systems or GPU-oriented workloads before

Apply

Lead DevOps/MLOps Engineer

Description:

Report this job