United States Hybrid

UP.Labs is hiring a Sr. AI Quality Engineer

About the Role

Own quality for a mission-critical AI system that processes complex freight transactions. In this role, you’ll ensure the reliability and precision of an AI-powered billing platform by building robust quality frameworks and leading deep technical investigations when issues arise.

What You’ll Do

  • Establish and refine a quality rubric that defines success and failure across key scenarios and exception types.
  • Build and manage golden datasets with real-world inputs, expected outputs, and customer-specific variations to benchmark system performance.
  • Conduct regular reviews of system outputs in development and production, identifying patterns, diagnosing root causes, and driving improvements into the product roadmap.
  • Design and run regression tests for model updates, logic changes, and new customer integrations.
  • Investigate quality incidents by tracing through email ingestion, parsing, prompts, model outputs, normalization, and final audit outcomes.
  • Analyze logs, traces, event histories, and data streams to pinpoint failures across distributed workflows and state transitions.
  • Produce clear, actionable reports with minimal reproductions, evidence, impact assessments, and recommended fixes.
  • Develop a standardized triage process and classification system for recurring quality issues.
  • Define and implement monitoring dashboards to track anomalies, error trends, and per-customer performance.
  • Collaborate with engineering and AI teams to enhance system observability, including traceability from input to final state.
  • Translate customer requirements into testable logic and identify gaps where real-world complexity exceeds current system modeling.

What We’re Looking For

  • Proven experience in roles combining quality assurance, deep technical investigation, and systems thinking—such as QA in distributed systems, product analysis with debugging focus, or LLM quality evaluation.
  • Hands-on experience assessing AI-generated outputs, including structured extraction, classification, tool use, and prompt pipelines.
  • Strong skills in debugging production systems using tools like Datadog, ELK, Honeycomb, OpenTelemetry, or Jaeger.
  • Proficiency in SQL or Python for data analysis and issue reproduction.
  • Familiarity with event-driven architectures, state machines, and distributed workflows—including handling retries, idempotency, and partial failures.
  • Ability to define clear requirements and convert ambiguous edge cases into structured test scenarios.
  • Comfort working in high-volume, complex environments with frequent edge cases.

Nice to Have

  • Background in freight, logistics, billing, or audit processes—especially with documents like BOLs, rate confirmations, or carrier invoices.
  • Experience designing evaluation metrics such as precision/recall, drift detection, or customer-specific scorecards.
  • Knowledge of workflow engines and distributed system failure modes.
  • Experience with annotation pipelines, taxonomy design, or human-in-the-loop QA systems.

Technology Environment

You’ll work with Datadog, ELK, Honeycomb, OpenTelemetry, Jaeger, SQL, Python, event-driven systems, state machines, distributed architectures, LLM inference, RAG, and prompt-based pipelines.

Culture & Expectations

This role thrives on ownership: when something breaks, you follow it to the root cause and drive resolution. You’re systematic in turning ambiguity into clarity and communicate effectively across product, engineering, machine learning, and operations teams.

Required Skills
SQLPythonDatadogELKHoneycombOpenTelemetryJaegerAI/LLM EvaluationDistributed SystemsEvent-Driven ArchitectureDebuggingQuality AssuranceIncident Triage DatadogELKHoneycombOpenTelemetryJaegerSQLPythonevent-driven architecturesworkflows/state machinesdistributed systemsdebugging production issuesAI/LLM quality evaluationlog analysis
Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation
Project choice & autonomy
International client base
Career growth support
Check compensation
Top earners exceed market rate
About company
UP.Labs
UP.Labs builds high-growth technology startups that enable faster, cleaner, and safer movement of people and goods. The stealth startup is building an AI-powered platform focused on billing, revenue integrity, and cash-flow automation for enterprise logistics operators.
All jobs at UP.Labs Visit website
Job Details
Department Quality Assurance
Category qa_testing
Posted 3 months ago