Research Engineer - Interpretability Systems

Company:

Acceler8 Talent

Location:

Santa Rosa, CA, 95402

Posted:

May 16, 2026

Apply

Description:

Research Engineer – Interpretability Systems

San Francisco, CA Onsite

Early-stage AI research lab Revenue-generating

An AI research lab working at the frontier of interpretability, alignment, and reinforcement learning is hiring Research Engineers focused on understanding what’s happening inside large language models

This role is for engineers who want to build the experimental systems that make interpretability research possible - not production ML, MLOps, or large-scale training infra

You’ll work on:

Activation tracing & mechanistic analysis

Custom RL-style environments for alignment research

Probing internal representations

Detecting latent concepts like deception, goals, uncertainty, or hidden objectives

️ Activation-level steering beyond prompting and fine-tuning

New benchmarks for model consistency and robustness

The work is fast, experimental, and greenfield: build custom tooling, test research ideas, get results, move on.

Ideal background:

Strong software engineering fundamentals

Experience with experimental ML / research systems

Comfort working close to model internals

Interest in interpretability, alignment, RL, or mechanistic understanding

PhD helpful, not required

This is not a role for scaling pipelines or maintaining production systems

It’s for people who enjoy ambiguous problems, fast research cycles, and building new tools from first principles

Interested? Apply & Drop me a message!

Apply

Research Engineer - Interpretability Systems

Description:

Report this job