Responsibilities
- Lead the assessment and definition of model quality benchmarks
- Manage end-to-end evaluation infrastructure, including validation datasets and performance indicators
- Develop and maintain authoritative 'golden sets' for each product to ensure consistent quality standards
- Drive initiatives to enhance model accuracy, precision, and recall across all offerings
- Define and monitor AI performance KPIs and customer-facing SLAs during scaling phases
- Design and implement automated quality assurance workflows
- Identify manual QA and data processing tasks and replace them with automated solutions
- Minimize reliance on human review by improving model confidence and consistency
- Construct scalable pipelines for evaluating model outputs
- Develop standardized procedures to rapidly assess pilot programs and active customer implementations
- Maintain high accuracy levels despite increasing customer volume and product complexity
- Integrate real-world user feedback into model improvement cycles
- Collect and examine instances of model failure in production environments
- Rank accuracy issues by impact and collaborate with engineering teams to implement fixes
Benefits
- Shape the evolution of AI Copilots in a rapidly expanding and transformative market
- Exercise significant autonomy and influence as a foundational team member
- Collaborate with proactive, high-performing colleagues and large enterprise clients
Work Arrangement
Remote — SF, NYC