Oakland, CA (or remote within US timezones) Hybrid $140-170k + equity

Elicit is hiring an Evaluation Engineer

Responsibilities

  • Build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals
  • Build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency
  • Figure out clever ways to solve the fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs)
  • Ensure ML engineers can kick off evals automatically on relevant commits, with results they can see at a glance and drill into
  • Provide dashboards for product managers showing performance over time and what's going wrong in production
  • Write well-architected code so other team members and ML engineers can understand and build on it
  • Enable engineers to quickly add examples and run an eval when starting on a new feature
  • Evaluate how well Elicit actually helps with decision-making in pharma, not just measure what's easy to measure
  • Encode real knowledge about how pharma customers make decisions (for example, choosing appropriate gold standards)
  • Provide appropriate statistical tests and confidence intervals so results are trustworthy
  • Mentor the evals engineering intern
  • Learn how people interact with the eval system to make it work better for them
  • Understand what users want from Elicit so evals measure what matters

Requirements

  • At least 3 years of experience as a professional software engineer
  • Demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.)
  • Aptitude and interest in evaluating how Elicit helps with pharma decision-making

Nice to Have

  • Knowledge of statistics (for e.g. calculating power and credence intervals for evals)
  • Experience with advanced Python (asyncio/trio and parallel processing strategies)
  • Front-end experience and strong UX sensibility (you'll be building dashboards)
  • TypeScript experience is a plus
  • Experience building developer tools (ML engineers are one of your most important clients)
  • Previous experience as a data engineer or working on AI infrastructure
  • Knowledge of pharma/biomed
  • Experience evaluating ML systems
  • Experience building language-model-based systems (helps with understanding Elicit and how to evaluate it)

Benefits

  • Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family
  • Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays
  • 401K with a 6% employer match

Compensation

For this role, we target starting ranges of: Career (L3): $140-170k + equity Senior (L4): $165-200k + equity

Work Arrangement

Hybrid

Team

Structure: You'll work closely with the evals team and mentor an evals engineering intern.

Additional Information

  • The only in-person requirement is attending quarterly team retreats, typically held on the west coast.
  • Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events
  • A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter
  • $1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and events
  • A team administrative assistant who can help you with personal and work tasks
Required Skills
statisticsadvanced Pythonpharma/biomed statisticsadvanced Pythonpharma/biomed
About company
Elicit
Elicit is an AI research assistant that uses language models to help researchers figure out what’s true and make better decisions. It helps expert researchers push the frontier in fields like biomedicine, health economics, and computer science.
All jobs at Elicit Visit website
Job Details
Department Engineering
Category other
Posted 4 months ago