MLOps and Data Engineer

Company:

DeepRec.ai

Location:

Santa Rosa, CA, 95402

Posted:

May 15, 2025

Apply

Description:

MLOps and Data Engineer

We are hiring for a fast-growing well funded 25 person start-up at the forefront of generative video technology. We’re looking for a Senior MLOps & Data Engineer to join their growing team.

What You'll Do

Own and evolve our ML data and training pipelines — from ingestion to deployment — ensuring they are scalable, efficient, and reliable

Build, deploy, and maintain distributed training jobs (e.g. with Megatron or other frameworks) across multi-node clusters

Design and manage robust data processing workflows, supporting both real-time and batch operations

Collaborate closely with research and engineering teams to bring ML models from prototype to production

Optimize inference pipelines for low-latency, cost-effective serving

Monitor and improve system performance, reliability, and security across the ML lifecycle

Key Qualifications

4+ years of experience in MLOps, ML infrastructure, or data engineering roles

Hands-on experience deploying and maintaining distributed training jobs on GPU clusters

Deep understanding of MLOps best practices and tools (CI/CD for ML, experiment tracking, versioning, etc.)

Strong experience with AWS or GCP

Proficiency in Python and common data/ML tooling (e.g., PyTorch, TensorFlow, Airflow, Kubeflow, etc.)

Experience with inference optimization for large models, focusing on latency, throughput, and cost

Strong systems design skills, including distributed systems, orchestration, and monitoring

Bonus: Experience with Megatron, TPU/GPU optimization, or similar large-model training frameworks

Please apply for more details

Apply

MLOps and Data Engineer

Description:

Report this job