Colombia Remote (Country)

Caseware is hiring an AI Test Architect

About the Role

As the AI Test Architect, you will shape the future of quality assurance by designing and deploying a next-generation 'Quality Intelligence' system powered by generative AI. This role is central to ensuring the integrity of AI-driven features across a large-scale, cloud-native SaaS platform used by hundreds of thousands of professionals worldwide.

Key Responsibilities

  • Design and implement a unified Quality Intelligence platform that leverages generative AI to forecast defect-prone areas, optimize test coverage, auto-generate test cases, and enable self-correcting test execution.
  • Define and lead the adoption of an enterprise-wide AI-first testing strategy, including methods for evaluating non-deterministic outputs, monitoring model drift, and detecting hallucinations throughout the development lifecycle.
  • Establish ethical and compliance standards for AI testing, aligned with evolving regulatory expectations.
  • Develop rigorous evaluation frameworks for internal AI agents and generative features, including red teaming, adversarial testing, and benchmarks focused on bias, prompt injection, jailbreaking, and goal alignment.
  • Build statistically sound evaluation pipelines using tools such as LangFuse, LangSmith, DeepEval, RAGAS, or Arize Phoenix, incorporating LLM-as-judge patterns and human-in-the-loop validation.
  • Create test harnesses for agentic behaviors, tool use, planning logic, multi-agent simulations, and runtime observability.
  • Integrate AI-powered testing into GitHub-based CI/CD workflows, enabling predictive flakiness detection, automated quality gates, and AI-generated test suites.
  • Design self-healing test frameworks by combining AI plugins with Playwright or Cypress, reducing maintenance overhead as UIs and models evolve.
  • Lead synthetic data generation, curate reference datasets, and implement AI-driven data masking to support high-fidelity, privacy-compliant testing at scale.
  • Collaborate with product, data science, ML engineering, and security teams to embed quality controls into AI feature development from inception.
  • Train and mentor QA teams to adopt AI-augmented testing practices through workshops, documentation, and community initiatives.
  • Champion AI quality standards across the organization, including dashboards that track DORA metrics alongside AI-specific indicators like hallucination rate and red team success.
  • Implement telemetry systems for AI quality, including drift detection, faithfulness scoring, and compliance monitoring, integrated with platforms like Langfuse.
  • Establish feedback mechanisms for model refinement, A/B testing safeguards, and proactive risk management in production environments.

Required Qualifications

  • Minimum of 8 years of experience in Quality Engineering or Test Architecture within cloud-native SaaS environments.
  • At least 2 years focused on testing AI, ML, or LLM-based systems.
  • Strong technical foundation in AWS, including serverless architectures, microservices, and infrastructure-as-code using Terraform or CloudFormation.
  • Hands-on experience with GitHub CI/CD ecosystems.
  • Proven ability to architect and test LLM-powered applications using LangChain, LangGraph, LangSmith, or similar frameworks.
  • Expertise in modern test automation tools such as Playwright or Cypress, with practical experience integrating AI-based self-healing capabilities.
  • Proficiency in JavaScript/TypeScript and/or Python.
  • Firm grasp of core AI concepts including transformers, embeddings, RAG architectures, and evaluation trade-offs.
  • Experience with LLM evaluation platforms such as Bedrock Evaluations, Prompt Management, Guardrails, DeepEval, RAGAS, Arize Phoenix, or Langfuse.
  • Track record of technical leadership and cross-functional influence.

Preferred Qualifications

  • Background in red teaming using tools like Cobalt Strike, Sliver, or Nmap.
  • Familiarity with adversarial testing methodologies and security-focused evaluation techniques.

Technology Environment

AWS, Terraform, CloudFormation, GitHub CI/CD, LangChain, LangGraph, LangSmith, Playwright, Cypress, JavaScript, TypeScript, Python, Bedrock Evaluations, Prompt Management, Guardrails, DeepEval, RAGAS, Arize Phoenix, Langfuse, Cobalt Strike, Sliver, Nmap.

Work Mode

This is a fully remote position open to candidates based in Colombia.

Required Skills
Quality EngineeringTest ArchitectureAWSTerraformCloudFormationGitHub CI/CDLangChainLangGraphLangSmithPlaywrightCypressJavaScriptLLM testingAI/ML validationcloud-native SaaS Quality EngineeringTest ArchitectureAWSTerraformCloudFormationGitHub CI/CDLangChainLangGraphLangSmithPlaywrightCypressJavaScriptLLM testingAI/ML validationcloud-native SaaS
Looking for a remote dev community?

200+ professionals, 37 countries, one network

Working remotely doesn't mean working alone. Iglu connects you with developers, designers, and digital experts worldwide. Collaborate, learn, and grow together.

Global professional network
Knowledge sharing & collaboration
Regular community events
Cross-project opportunities
Join the community
37 countries represented
About company
Caseware
Caseware is a leading global provider of audit, accounting, and financial reporting software solutions for professionals.
All jobs at Caseware Visit website
Job Details
Department Data, AI, & Interoperability Platform - QA
Category qa_testing
Posted 3 days ago