Post Job Free
Sign in

Senior MLOps Engineer

Company:
DeepRec.ai
Location:
San Jose, CA
Posted:
May 15, 2025
Apply

Description:

Senior MLOps Engineer – Real-Time AI & Video Applications (Hybrid)

Office Location: San Jose (Hybrid)

Job Type: Full-time

We're hiring for an impressive AI company who are focussed on real-time AI and Video Applications. Their team is made up of leading experts in computer graphics and generative modeling, and they are on a rapid growth trajectory. We're looking for experienced MLOps Engineers that want to work on real-time AI applications that are shaping the future of media.

The Role

We’re looking for a talented MLOps Engineer to build and maintain robust machine learning pipelines and infrastructure. You’ll be working closely with AI researchers, data scientists, and software engineers to deploy state-of-the-art models into production, optimize real-time inference, and ensure systems scale effectively.

What You’ll Do

Design and optimize ML pipelines for training, validation, and inference

Automate deployment of deep learning and generative models for real-time use

Implement versioning, reproducibility, and rollback capabilities

Deploy and manage containerized ML solutions on cloud platforms (AWS, GCP, Azure)

Optimize model performance using TensorRT, ONNX Runtime, and PyTorch

Work with GPUs, distributed computing, and parallel processing to power AI workloads

Build and maintain CI/CD pipelines using tools like GitHub Actions, Jenkins, ArgoCD

Automate model retraining, monitoring, and performance tracking

Ensure compliance with privacy, security, and AI ethics standards

What You Bring

3+ years of experience in MLOps, DevOps, or AI model deployment

Strong skills in Python and frameworks like TensorFlow, PyTorch, ONNX

Proficiency with Docker, Kubernetes, and serverless architectures

Hands-on experience with ML tools (ArgoWorkflow, Kubeflow, MLflow, Airflow)

Experience deploying and optimizing GPU-based inference (CUDA, TensorRT, DeepStream)

Solid grasp of CI/CD practices and scalable ML infrastructure

Passion for automation and clean, maintainable system design

Strong understanding of distributed systems

Bachelor’s or Master’s in Computer Science or equivalent work experience

Bonus Skills

Experience with CUDA programming

Exposure to LLMs and generative AI in production

Familiarity with distributed computing (Ray, Horovod, Spark)

Edge AI deployment experience (Triton Inference Server, TFLite, CoreML)

Basic networking knowledge

We look forward to hearing from you

Apply