Founding ML Engineer
Location: San Francisco, CA Company Stage: Early-Stage (YC-backed, Profitable, High-Growth) Office Type: Onsite Salary: $150,000 – $300,000 + Equity (0.10% – 0.50%)
This fast-growing, venture-backed startup is building the core infrastructure layer that enables AI agents to access, understand, and act on real-time internet data. Instead of traditional search workflows designed for humans, the platform provides APIs that allow AI systems to retrieve high-fidelity, structured data directly from source systems.
The company has achieved strong early traction—scaling to millions in ARR within its first year—and is already serving enterprise customers. Backed by leading investors including Y Combinator and top-tier venture firms, the team is now focused on pushing the boundaries of applied machine learning to power the next generation of AI-native data systems.
What You Will Do
Own the end-to-end development of core ML systems—from research and modeling to production deployment
Design and train models for information retrieval, entity resolution, classification, and structured data extraction
Build systems that transform messy, multilingual web-scale data into structured, queryable intelligence
Develop embedding models, ranking systems, and retrieval pipelines for high-precision search and matching
Apply transformer architectures and modern NLP techniques to real-world data problems
Leverage LLMs for tasks such as extraction, classification, and data enrichment at scale
Continuously evaluate and improve model performance using rigorous experimentation and metrics
Work closely with engineering and product teams to integrate ML systems into production APIs
Ideal Background
3+ years of experience building and shipping production ML systems, particularly in NLP, information retrieval, or entity resolution
Strong hands-on experience with Python and PyTorch
Deep understanding of transformer architectures, including training and fine-tuning encoder models
Experience building retrieval systems, classifiers, or embedding-based systems
Familiarity with representation learning techniques (e.g., contrastive learning, metric learning)
Experience applying LLMs to structured data problems (e.g., extraction, classification, generation)
Strong problem-solving skills with the ability to work on ambiguous, large-scale data challenges
High ownership mindset with a strong bias toward execution in fast-paced environments
Preferred
Experience with entity resolution or record linkage at scale
Background in multilingual or cross-lingual NLP
Experience building taxonomies, ontologies, or knowledge systems
Familiarity with distributed training on GPU clusters
Experience scaling LLM inference pipelines in production
Research publications or open-source contributions in NLP/IR
Compensation and Benefits
Base salary: $150K – $300K
Equity: 0.10% – 0.50% (founding-level ownership)
Visa sponsorship available
Opportunity to join at an early stage with strong product-market fit and rapid growth
High ownership role with direct impact on core product and company trajectory
Work alongside experienced founders and top-tier investors
This role is ideal for ML engineers who want to operate at the frontier of applied NLP and retrieval—owning core intelligence systems that power how AI agents interact with real-world data at scale.