Research Engineer – Interpretability Systems
San Francisco, CA Onsite
Early-stage AI research lab Revenue-generating
An AI research lab working at the frontier of interpretability, alignment, and reinforcement learning is hiring Research Engineers focused on understanding what’s happening inside large language models
This role is for engineers who want to build the experimental systems that make interpretability research possible - not production ML, MLOps, or large-scale training infra
You’ll work on:
Activation tracing & mechanistic analysis
Custom RL-style environments for alignment research
Probing internal representations
Detecting latent concepts like deception, goals, uncertainty, or hidden objectives
️ Activation-level steering beyond prompting and fine-tuning
New benchmarks for model consistency and robustness
The work is fast, experimental, and greenfield: build custom tooling, test research ideas, get results, move on.
Ideal background:
Strong software engineering fundamentals
Experience with experimental ML / research systems
Comfort working close to model internals
Interest in interpretability, alignment, RL, or mechanistic understanding
PhD helpful, not required
This is not a role for scaling pipelines or maintaining production systems
It’s for people who enjoy ambiguous problems, fast research cycles, and building new tools from first principles
Interested? Apply & Drop me a message!