Responsibilities

Lead the development and enhancement of a multi-tiered evaluation pipeline covering tool call validation, risk heuristics, and LLM-based transcript assessment.
Improve evaluation methods for full system performance across coordinated agents, RAG-enhanced responses, and multi-party voice interactions using provider-agnostic verification.
Enhance observability tools to expose evaluation metrics, identify performance regressions, and support data-informed quality decisions.
Develop real-time monitoring systems that detect live interaction issues, apply contextual interventions, and feed insights back into system improvements.
Collaborate with machine learning, product, and operations teams to convert real-world failures into automated test cases and strengthen evaluation coverage.
Create and manage test suites focused on adversarial scenarios and edge cases, including resistance to prompt injection and behavior under ambiguous user input.
Promote early integration of quality standards by embedding evaluation into prompt design, defining behavioral acceptance criteria, and prioritizing quality in development.
Help design the orchestration of QA workflows, including background tasks, alerting via Slack, and risk data storage, to boost efficiency and developer usability.

Compensation

Not specified

Work Arrangement

Not specified

Team

Cross-functional team collaborating with ML engineers, product managers, and operations leads

Responsibilities

Own and extend our multi-layered eval pipeline and verification portfolio: deterministic quality checks on tool calls, risk-factor heuristics, and LLM-graded transcript evaluation.
Advance our capabilities to evaluate end-to-end system performance (across orchestrated agents, RAG-supported responses, multi-party voice conversations) with modular and auditable verification that is independent of any single model provider.
Drive improvements to our observability stack to surface eval metrics, detect regressions, and enable data-driven quality decisions across the team.
Build real-time monitoring and verification loops that catch issues in production interactions as they happen, intervening with context and feeding back for system refinement.
Partner with ML engineers, product managers, and operations leads to translate real-world failure modes into automated checks, closing the loop between production incidents and eval coverage.
Build and maintain adversarial and edge-case test suites — including prompt injection resistance, guardrail robustness, and graceful degradation under ambiguous patient inputs.
Champion “shift-left” quality practices: embed eval criteria into prompt engineering workflows, define acceptance criteria for new agent behaviors, and make quality a first-class concern in the development cycle.
Contribute to the design of our QA pipeline orchestration (background processing, Slack notifications, risk assessment persistence) to improve throughput, reliability, and developer experience.

Not specified

Third Way Health is hiring a Senior QA Engineer

Responsibilities

Compensation

Work Arrangement

Team

Responsibilities

Similar Jobs

Staff Software Engineer

QA Automation Engineer

Senior QA Engineer

Android App Automation Tester

Distributed Systems Testing Software Engineer, Python / Go

Director, Quality Assurance

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Become an AI Developer: Your Career Guide

CI/CD Testing Tools: 23 Best Options for 2026