Poland Remote (Global)

Mindrift is hiring an Evaluation Scenario Writer - AI Agent Testing Specialist

About the Role

Mindrift is looking for an Evaluation Scenario Writer - AI Agent Testing Specialist to design structured test scenarios that evaluate the performance of LLM-based agents. You will create realistic simulations of human-performed tasks and define gold-standard behavior to measure agent actions against.

What You'll Do

  • Design structured test scenarios based on real-world tasks.
  • Define the golden path and acceptable agent behavior.
  • Annotate task steps, expected outputs, and edge cases.
  • Work with developers to test your scenarios and improve clarity.
  • Review agent outputs and adapt tests accordingly.

What We're Looking For

  • Bachelor's or Master’s degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems, or other related fields.
  • 3+ years of relevant experience.
  • Advanced (C1) or above level of English proficiency.
  • Ready to learn new methods, able to switch between tasks and topics quickly.
  • Able to sometimes work with challenging, complex guidelines.
  • Have a laptop, reliable internet connection, and available time.

Benefits & Compensation

  • Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.
  • Work on advanced AI projects and gain valuable experience that enhances your portfolio.
  • Influence how future AI models understand and communicate in your field of expertise.

Work Mode

This is a global, remote opportunity.

Required Skills
AI Agent TestingTest Scenario DesignPrompt EngineeringCritical ThinkingAnalytical SkillsWritten CommunicationAttention to DetailLLM EvaluationCreative WritingQuality Assurance AI Agent TestingTest Scenario DesignPrompt EngineeringCritical ThinkingAnalytical SkillsWritten CommunicationAttention to DetailLLM EvaluationCreative WritingQuality Assurance
Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation
Project choice & autonomy
International client base
Career growth support
Check compensation
Top earners exceed market rate
About company
Mindrift
Mindrift connects specialists with AI projects from major tech innovators. Their mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.
All jobs at Mindrift Visit website
Job Details
Category qa_testing
Posted 8 months ago