Post Job Free
Sign in

Machine Learning Engineer - MLOps Team

Company:
Aura Intelligence
Location:
New York City, NY
Posted:
May 13, 2025
Apply

Description:

Job Description

Machine Learning Engineer - MLOps Team

We are seeking a skilled Machine Learning Engineer to join our growing MLOps team. In our fast-paced startup environment, you'll work closely with our data scientists and data engineers to productionize models and build robust, scalable classification systems. You will have the autonomy to drive the evolution of our ML infrastructure—from development through deployment—and make a direct impact on our product's success.

Key Responsibilities: • Develop and Optimize ML Pipelines:

Write efficient, clean, and maintainable Python code to implement machine learning pipelines for various classification projects. Ensure these pipelines meet production-grade standards for performance and scalability.

• Production Deployment:

Deploy and optimize diverse classification models—including cross-encoders, bi-encoders, transformers, and custom PyTorch networks—ensuring effective GPU/CPU resource management, memory optimization, and scalability tuning.

• End-to-End System Ownership:

Take full responsibility for deployed ML systems, including incident response, performance monitoring, and ongoing quality maintenance with minimal supervision.

• Data Integration and Analysis:

Collaborate with the data engineering team to analyze model inputs/outputs, validate predictions, and explore potential feature improvements using SQL.

• MLOps Infrastructure & CI/CD:

Build and maintain robust, extensible, and reproducible MLOps infrastructure. Establish and manage CI/CD pipelines, and set up observability for system metrics, logs, and alerts.

• Collaboration and Continuous Improvement:

Contribute to technical design discussions, help break down implementation tasks, and address technical debt as needed. Work cross-functionally with both engineering and data science teams to continuously refine our deployment processes.

Day-to-Day Work

You'll be responsible for implementing and maintaining the classification systems that form the backbone of our data platform. This includes:

Translating research models into production-ready code.

Developing and optimizing inference pipelines.

Managing compute resources and scaling solutions.

Responding to and resolving production incidents.

Collaborating with data scientists to improve model deployment efficiency.

Writing and analyzing SQL queries to monitor model inputs/outputs and validate prediction quality.

We're looking for someone who can work independently and take ownership while maintaining high standards. You'll have a real impact in shaping our MLOps practices and building scalable ML systems that matter. If you're passionate about turning ML research into production-ready solutions and care about writing quality code, we'd love to hear from you.

RequirementsExperience:

5+ years of hands-on experience in ML engineering or related roles with a PhD. 7+ years of experience otherwise.

Strong expertise in NLP methods and tools (transformers, BERT, GPT, LLM).

Proven track record in entity resolution and large-scale data integration challenges.

Extensive production experience in MLOps (AWS, Snowflake, or similar cloud platforms).

Proficiency in Python, PyTorch, TensorFlow, or equivalent ML frameworks.• Technical Skills:

Strong proficiency in Python and production-grade software development.

Solid understanding of GPU acceleration, resource optimization, and scalable ML inference pipelines.

Strong SQL skills and experience with data warehouses.

Familiarity with AWS (particularly S3) and broader cloud computing concepts.• DevOps & Testing:

Experience setting up observability (metrics, logging, and alerting) for ML systems.

Proficiency with CI/CD practices and testing frameworks (pytest for unit, integration, and model evaluation testing).

Knowledge of version control (Git) and best practices in documentation.• Problem-Solving & Communication:

Excellent problem-solving skills with a proven ability to debug complex production issues.

Strong communication skills to effectively collaborate with both technical and non-technical team members.Preferred Qualifications

Experience with containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes).

Familiarity with serverless compute platforms (e.g., Modal) and workflow orchestration tools (e.g., Dagster).

Knowledge of additional classification models (e.g., XGBoost, FastText) and broader data engineering concepts.

Understanding of data versioning, experiment tracking, and model registry practices.

Experience with observability platforms (e.g., Datadog) and data warehouses like Snowflake.

Experience working with LLMs in production.

Full-time

Apply