Machine Learning Engineer (Mid to Staff Level) LLMs in Production AI x Healthcare San Mateo (Hybrid)
We’re working with a Series A AI company based in the Bay Area that’s bringing large language models to life - in real-world, high-impact healthcare workflows.
Location: San Mateo, CA - 4 days/week in office
Compensation: $85K–$260K base (DOE) + strong equity + full benefits
Stage: Series A $19M raised 7-figure ARR Profitable with indefinite runway
What They’re Building
This startup is transforming how voice-based workflows are handled in healthcare - using production-grade Voice AI agents powered by LLMs to automate high-stakes interactions like insurance verifications, authorizations, claim status checks, and more.
This isn’t experimental - thousands of these AI-driven calls happen daily across major healthcare organizations.
️ The Role: Machine Learning Engineer (Open to Mid & Staff Levels)
This role sits at the intersection of software engineering, MLOps, and LLM deployment. You’ll be part of a lean, highly experienced team shipping real AI to production.
Depending on your level, your focus might range from prompt design and internal tooling to building core ML infrastructure at scale.
What You’ll Work On:
Design and iterate on prompts for classification, summarization, extraction, and task automation
Build tools for prompt testing, versioning, and performance evaluation
Optimize and fine-tune LLMs for latency, cost, and alignment with business logic
Deploy and monitor ML models in production environments (on-prem + cloud)
Maintain robust MLOps pipelines for training, evaluation, and CI/CD
Contribute to infrastructure powering high-volume, real-time inference for voice agents
Collaborate cross-functionally with product and engineering to translate complex workflows into ML-powered solutions
About You
We’re open to both mid-level and staff-level candidates - and will tailor the scope to match your background.
You would be a great fit if you have:
3+ years (mid) or 5+ years (staff) in ML engineering, LLMs, or AI infra
Strong Python and experience with tools like Hugging Face, LangChain, LlamaIndex, PromptLayer, or Langfuse
Hands-on experience deploying and optimizing LLMs in production
Experience with model serving (Triton, ONNX, FastAPI) and containerized infra (Docker, K8s)
Familiarity with MLOps frameworks (MLflow, Kubeflow, SageMaker, Vertex AI)
Bonus: exposure to healthcare data formats (FHIR, HL7) or vector databases (Pinecone, Weaviate)
You care deeply about model performance, prompt design, and production-grade reliability - not just research experiments.
Why This Role Matters
You’ll ship AI into real-world, high-impact workflows - not just prototypes
The engineering bar is high - all your peers are experienced and hands-on
You’ll shape new infrastructure from scratch - this is greenfield, not maintenance
You’ll help scale LLMs in one of the most valuable and complex sectors in the U.S.
If you're excited about working on production LLM systems that actually matter - with a team that cares deeply about both quality and speed - send me a message to learn more to