Responsibilities
- Build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals
- Build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency
- Figure out clever ways to solve the fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs)
- Ensure ML engineers can kick off evals automatically on relevant commits, with results they can see at a glance and drill into
- Provide dashboards for product managers showing performance over time and what's going wrong in production
- Write well-architected code so other team members and ML engineers can understand and build on it
- Enable engineers to quickly add examples and run an eval when starting on a new feature
- Evaluate how well Elicit actually helps with decision-making in pharma, not just measure what's easy to measure
- Encode real knowledge about how pharma customers make decisions (for example, choosing appropriate gold standards)
- Provide appropriate statistical tests and confidence intervals so results are trustworthy
- Mentor the evals engineering intern
- Learn how people interact with the eval system to make it work better for them
- Understand what users want from Elicit so evals measure what matters
Requirements
- At least 3 years of experience as a professional software engineer
- Demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.)
- Aptitude and interest in evaluating how Elicit helps with pharma decision-making
Nice to Have
- Knowledge of statistics (for e.g. calculating power and credence intervals for evals)
- Experience with advanced Python (asyncio/trio and parallel processing strategies)
- Front-end experience and strong UX sensibility (you'll be building dashboards)
- TypeScript experience is a plus
- Experience building developer tools (ML engineers are one of your most important clients)
- Previous experience as a data engineer or working on AI infrastructure
- Knowledge of pharma/biomed
- Experience evaluating ML systems
- Experience building language-model-based systems (helps with understanding Elicit and how to evaluate it)
Benefits
- Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family
- Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays
- 401K with a 6% employer match
Compensation
For this role, we target starting ranges of: Career (L3): $140-170k + equity Senior (L4): $165-200k + equity
Work Arrangement
Hybrid
Team
Structure: You'll work closely with the evals team and mentor an evals engineering intern.
Additional Information
- The only in-person requirement is attending quarterly team retreats, typically held on the west coast.
- Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events
- A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter
- $1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and events
- A team administrative assistant who can help you with personal and work tasks