Engineering Lead/Research Engineer, Benchmarking

Company:

Epoch AI

Location:

Calumet, PA, 15689

Posted:

May 12, 2025

Apply

Description:

Engineering Lead/Research Engineer, Benchmarking

Department: Epoch AI

Employment Type: Full Time

Location: Remote

Compensation: $150,000 - $300,000 / year

Description

Epoch AI is looking for a Research Engineer to join our Benchmarking team and an Engineering Lead to help manage it. There, you'll help us provide independent evaluations of leading AI models to enable researchers, developers, and policymakers to better understand AI development.

About the roles and team

Epoch AI is a leading research institute investigating trends in artificial intelligence, aiming to provide rigorous, accessible insights into AI development. A core part of our work is understanding the capabilities of state-of-the-art AI models through benchmarking.

We are seeking talented senior engineers to join (and lead the engineering efforts of) our Benchmarking team and play a crucial role in expanding and operating our AI Benchmarking Hub. This platform provides independent evaluations of leading AI models on challenging benchmarks, helping researchers, developers, and policymakers understand what AI systems can do and where they are headed. We are open to senior or more junior hires.

An Engineering Lead for Benchmarking would drive the technical execution and strategy for our AI Benchmarking Hub. This would be a hands-on leadership role where you will own the engineering roadmap and actively contribute code to our evaluation infrastructure (using frameworks like Inspect).

A Research Engineer on our Benchmarking team will focus on the practical implementation, execution, and improvement of our evaluation systems. You'll work hands-on with cutting-edge AI models and evaluation frameworks (primarily the Inspect library) to generate the data that powers the Benchmarking Hub.

In either role, your deep engineering expertise and operational focus will be vital to making our benchmarking timely, robust, scalable, and transparent, directly impacting the insights we provide to researchers, developers, and policymakers.

These roles are fully remote and we expect to be legally able to hire in many countries. If you are unsure whether we can hire in the country you are based in, please email . We are only considering full-time candidates.

Applications are rolling.

Please do not include a cover letter, photograph, or headshot of yourself, or any personal information that is not relevant to the role for which you're applying (including marital status, age, identity traits, etc.).

Key Responsibilities

Implement benchmarks: Implement new and existing AI benchmarks within our evaluation framework (primarily using the Inspect library) to expand the suite of capabilities we track.

Run evaluations: Execute evaluations of AI models across our benchmark suite, including rapid response when major models are released.

Maintain & improve our benchmarking infrastructure: Contribute to the development and maintenance of our benchmarking infrastructure, including adding support for new model APIs/formats, and ensuring the reliability and scalability of our evaluation pipeline.

Expand the benchmarking hub: Contribute technically to the improvement and expansion of the public-facing Epoch AI Benchmarking Hub, helping to make it a central go-to resource for model evaluation data.

Collaborate: Work closely with Epoch AI researchers and analysts to ensure evaluation data and outputs are accurate, insightful, and effectively integrated into our research products and publications. Only for the Engineering Lead

Own and execute the engineering roadmap: Define, manage, and actively contribute to implementing the long-term engineering roadmap for our Benchmarking infrastructure, ensuring it supports research priorities and enables rapid evaluation cycles.

Lead and execute evaluations: Personally oversee and contribute to the execution of evaluations across our benchmark suite. Ensure a rapid response pipeline is in place and operational to benchmark major new models within tight timeframes (e.g., targeting initial results within days of release).

Team leadership & mentorship: Provide technical guidance and day-to-day management for our current benchmarking research engineer and future hires. What we are looking for

Outstanding engineering skills: Possess a strong software engineering background with several years of professional experience building and maintaining complex systems. You are expected to regularly contribute high-quality, robust, and maintainable code and be comfortable diving deep into existing codebases and infrastructure. We expect most (but not necessarily all) strong candidates for the Engineering Lead position to have 10 years or more of engineering experience.

Mission-driven: You're motivated by Epoch AI's mission to provide rigorous, independent insight into key trends in AI. You want to deliver public, trustworthy evaluations of AI capabilities on challenging benchmarks, empowering researchers, policymakers, and the wider public to make well-informed decisions about AI.

AI domain expertise is a strong plus but not required: Hands-on experience running LLM evaluations, familiarity with evaluation frameworks like Inspect, as well as a solid grasp of current AI trends are a strong plus. However, outstanding engineering skills and an ability to learn quickly matter more than direct background in these areas. Only for the Engineering Lead

Leadership & people-management experience: Successfully led small engineering teams, set priorities, and coached reports. If you don't tick all these boxes but think you would be a great fit, please consider applying anyway!

What we offer

Compensation: Annual salary between $150k and $200k for the Research Engineer, $200k and $300k for the Engineering Lead, depending on experience and location.

Impactful work: Directly contribute to a leading resource for understanding AI progress, informing critical global discussions.

Cutting-edge field: Work at the forefront of AI, evaluating the most advanced models as they emerge.

Collaborative environment: Join a mission-driven team of researchers and engineers passionate about understanding AI trends.

Other benefits:

Fully remote environment, including flexible work hours and schedules for most roles.

Competitive global benefits program, including:

Comprehensive health insurance program, including supplemental benefits specific to a local country, as available and mandated by local law.

Life insurance and pension plan, if applicable in your country.

Generous paid time off (PTO), including:

We don't set a specific limit on paid time off per year, and protect a minimum of 30 days off per year (including public local holidays and vacation time), pro-rated by contract length.

Unlimited (within reason) personal and sick leave

Parental leave - up to 6 months of a combination of paid and unpaid parental leave during the first 2 years after child's birth or adoption, for permanent staff.

A flexible and generous expense policy for you to spend on equipment and a large range of productivity tools or learning/development opportunities you might find valuable, subject to regulations and manager approval.

Paid work trips, including 3 staff retreats per year and relevant conferences.

Access to our very well-equipped offices in Berkeley, California, including paid meals, snacks, gym, and more.

All staff, independently of where they are based, have access to the office for at least 20 days each year.

Other benefits as allowed at the discretion of Epoch AI's leadership and local availability.

Please email if you have any questions about this role or accessibility requests.

While we welcome applicants from all time zones, we prefer candidates who can overlap with both UTC-8 (Pacific Time) and UTC (Greenwich Mean Time), as our team works in this range of time zones.

Please submit all of your application materials in English and note that we require professional level English proficiency.

For this position we prefer candidates who can travel: we hold three retreats per year, during which we record podcast episodes and other communication efforts.

Epoch is committed to building an inclusive, equitable, and supportive community for you to thrive and do your best work. We're committed to finding the best people for our team, so please don't hesitate to apply for a role regardless of your age, gender identity/expression, political identity, personal preferences, physical abilities, veteran status, neurodiversity or any other background.

Apply

Engineering Lead/Research Engineer, Benchmarking

Description:

Report this job