MLOps and Data Engineer
We are hiring for a fast-growing well funded 25 person start-up at the forefront of generative video technology. We’re looking for a Senior MLOps & Data Engineer to join their growing team.
What You'll Do
Own and evolve our ML data and training pipelines — from ingestion to deployment — ensuring they are scalable, efficient, and reliable
Build, deploy, and maintain distributed training jobs (e.g. with Megatron or other frameworks) across multi-node clusters
Design and manage robust data processing workflows, supporting both real-time and batch operations
Collaborate closely with research and engineering teams to bring ML models from prototype to production
Optimize inference pipelines for low-latency, cost-effective serving
Monitor and improve system performance, reliability, and security across the ML lifecycle
Key Qualifications
4+ years of experience in MLOps, ML infrastructure, or data engineering roles
Hands-on experience deploying and maintaining distributed training jobs on GPU clusters
Deep understanding of MLOps best practices and tools (CI/CD for ML, experiment tracking, versioning, etc.)
Strong experience with AWS or GCP
Proficiency in Python and common data/ML tooling (e.g., PyTorch, TensorFlow, Airflow, Kubeflow, etc.)
Experience with inference optimization for large models, focusing on latency, throughput, and cost
Strong systems design skills, including distributed systems, orchestration, and monitoring
Bonus: Experience with Megatron, TPU/GPU optimization, or similar large-model training frameworks
Please apply for more details