Post Job Free
Sign in

Member of Technical Staff- ML Infrastructure/ML Ops

Company:
Acceler8 Talent
Location:
Palo Alto, CA, 94306
Posted:
May 24, 2025
Apply

Description:

Member of Technical Staff, MLOps / ML Infrastructure

We're hiring a Member of Technical Staff to join our MLOps / ML Infrastructure team. If you're an engineer who cares about performance, systems, and reliability—and you want to work on infrastructure that directly supports real-world machine learning applications—this is a role where your skills will matter.

You’d be joining a company built by experienced leaders in AI who are now focused on applying large-scale language models to enterprise problems. The team behind the product has already demonstrated success with a 350B+ parameter frontier model powering a widely-used conversational agent. Now the emphasis is on delivering scalable and secure systems for business environments.

As a Member of Technical Staff- ML Infrastructure, you’ll help shape the infrastructure that enables end-to-end ML workflows, including model training, deployment, and production orchestration. You’ll design and maintain control planes and distributed systems that support high-volume, high-availability environments. This is a role for engineers who are comfortable working with large-scale clusters, who like digging into systems, and who want to have a measurable impact on a mature but still evolving platform.

What we can offer you:

Salary range of $175,000 – $350,000, depending on experience and scope

Work with a technically strong and collaborative team

Clear ownership and autonomy over infrastructure components

Direct impact on production systems used in real enterprise deployments

Flexible working arrangements with a hybrid or remote structure

Access to significant compute and internal models for experimentation

Key responsibilities:

Build and operate scalable infrastructure for ML training and deployment pipelines

Design and manage control planes and service tooling for ML applications

Maintain and optimize Kubernetes, SLURM, and Ray clusters across environments

Support reliability and uptime goals across production ML systems

Partner with researchers and engineers to align infrastructure with model development needs

Monitor, debug, and improve system performance, efficiency, and security posture

If you're comfortable working with technologies like Kubernetes, SLURM, Ray, Terraform, CI/CD, distributed systems, security best practices, and enjoy applying them in ML production environments—this role may align well with your background.

As a Member of Technical Staff- ML Infrastructure, you’ll be central to our infrastructure strategy—building systems that enable engineers and researchers to move quickly, safely, and at scale.

Apply