Cambridge, Massachusetts, United States

Third Way Health is hiring a Senior QA Engineer

About the Role

Third Way Health is hiring a Senior QA Engineer to own and evolve the quality and evaluation infrastructure for our AI-powered patient engagement platform. This is a high-impact individual contributor role where you will define how quality is measured across automated and manual workflows and build the tooling to make that definition actionable.

What You'll Do

  • Own and extend our multi-layered evaluation pipeline and verification portfolio, including deterministic quality checks, risk-factor heuristics, and LLM-graded transcript evaluation.
  • Advance our capabilities to evaluate end-to-end system performance across orchestrated agents, RAG-supported responses, and multi-party voice conversations.
  • Drive improvements to our observability stack to surface evaluation metrics, detect regressions, and enable data-driven quality decisions.
  • Build real-time monitoring and verification loops that catch issues in production interactions and feed back for system refinement.
  • Partner with ML engineers, product managers, and operations leads to translate real-world failure modes into automated checks.
  • Build and maintain adversarial and edge-case test suites for prompt injection resistance, guardrail robustness, and graceful degradation.
  • Champion “shift-left” quality practices by embedding evaluation criteria into prompt engineering workflows and defining acceptance criteria for new agent behaviors.
  • Contribute to the design of our QA pipeline orchestration to improve throughput, reliability, and developer experience.

What We're Looking For

  • 5+ years of software or test engineering experience, with 3+ years focused on quality infrastructure for AI/ML or data-intensive systems.
  • Strong proficiency in Python for building test frameworks, evaluation pipelines, and API-level integration tests.
  • Demonstrated experience designing evaluation systems for LLM-based applications, with a clear understanding of the model as a generation layer, not the quality layer.
  • Familiarity with the architectural tradeoffs of relying on LLM outputs in production, including variance across model versions and prompt sensitivity.
  • Experience building extensible, rule-based validation systems that scale across a growing surface area of features.
  • Solid understanding of voice AI or conversational AI systems, including tool-calling patterns, transcript analysis, and interaction-level quality metrics.
  • Hands-on experience with observability and metrics instrumentation in production environments.
  • Excellent communication skills and the ability to collaborate effectively across engineering, product, and non-technical stakeholders.
  • Strong interest in healthcare innovation and building AI systems that meaningfully improve health outcomes.

Nice to Have

  • Experience building QA or evaluation systems in healthcare or regulated environments, with familiarity with standards such as HIPAA, GDPR, or FDA guidance.
  • Proven experience leading complex technical initiatives and mentoring junior engineers.
  • Experience building systems where quality guarantees live in the verification infrastructure rather than in any single model.
  • Familiarity with risk-scoring systems, anomaly detection, or production safety nets for autonomous AI agents.
  • Experience with AI safety testing, including adversarial evaluation, jailbreak testing, and bias detection in LLM outputs.
  • Hands-on experience with CI/CD pipelines for evaluation automation and infrastructure-as-code deployment patterns.
  • Experience with voice UI testing tools and platforms focused on evaluating speech generation and response quality.
  • Knowledge of accessibility testing and inclusive design principles.

Technical Stack

  • Python, pytest, FastAPI TestClient, Pydantic
  • LLM-based evaluation systems
  • Observability and metrics tooling
  • CI/CD pipelines (CircleCI, GitHub Actions)
  • Voice AI/conversational AI systems

Team & Environment

You will partner closely with ML engineers, product managers, and operations leads to translate real-world needs into robust quality infrastructure.

Required Skills
PythonpytestFastAPI TestClientPydanticLLM-based evaluation systemsObservabilityCI/CDCircleCIGitHub ActionsVoice AIconversational AItest frameworksAPI testingquality infrastructure PythonpytestFastAPI TestClientPydanticLLM-based evaluation systemsObservabilityCI/CDCircleCIGitHub ActionsVoice AIconversational AItest frameworksAPI testingquality infrastructure
Want to work from Thailand?

Join a remote network built for tech talent

Iglu gives you real employment in Southeast Asia — visa, work permit, and projects included. Pick what you work on, earn performance-based pay, and live where you want.

Legal employment in Thailand & Vietnam
Choose your own projects
Performance-based revenue sharing
Relocation support available
Join Iglu
200+ professionals worldwide
About company
Third Way Health
Company building AI systems to help millions of patients access care faster through next-generation technology in a regulated healthcare environment.
All jobs at Third Way Health Visit website
Job Details
Department Quality Assurance
Category qa_testing
Posted 2 months ago