About the Role

The role involves assessing the performance of large language models in educational contexts by identifying issues, categorizing errors, and providing structured feedback to improve model behavior and reliability.

Responsibilities

Evaluate outputs from language models for factual correctness and coherence
Identify harmful, biased, or inappropriate content in AI-generated text
Classify types of model errors including hallucinations and logical flaws
Follow detailed guidelines to score model responses consistently
Provide clear, actionable feedback to improve model training
Test model behavior across diverse educational prompts and scenarios
Document patterns in model failures for engineering review
Collaborate with researchers to refine evaluation criteria
Maintain high accuracy and attention to detail in assessments
Adapt quickly to updated instructions and testing protocols
Contribute to the development of new evaluation frameworks
Ensure alignment of model outputs with pedagogical goals
Report edge cases that reveal model limitations
Participate in calibration sessions with team members
Track and log evaluation results in shared systems
Support quality assurance across multiple AI features
Help prioritize issues based on severity and frequency
Review model updates for improvements or regressions
Maintain confidentiality of internal testing data
Engage in ongoing training to stay current with AI developments
Communicate findings clearly and concisely
Work independently while meeting deadlines
Contribute to a culture of continuous improvement
Follow ethical guidelines in all evaluations
Assist in creating realistic educational prompts for testing

Nice to Have

Master’s degree in education or related field
Experience working with large language models
Background in special education or diverse learning needs
Familiarity with K–12 curriculum frameworks
Prior work in AI ethics or content safety
Experience with annotation or labeling tasks
Knowledge of prompt engineering techniques
Exposure to educational technology products
Research experience in cognitive science or learning theory
Multilingual abilities

Compensation

$60,000 - $80,000 annually, commensurate with experience

Work Arrangement

Remote with flexible hours; some real-time collaboration required

Team

Small, agile team focused on AI-driven educational tools

What You’ll Be Doing

Review and score AI-generated responses to classroom-related prompts
Flag content that violates safety or accuracy standards
Participate in weekly team discussions to align on evaluation standards

Why This Role Matters

Your work directly improves the reliability of AI tools used by educators and students
You help ensure AI outputs are safe, factual, and appropriate for learning environments

Not available for this position

MagicSchool AI is hiring an Associate LLM Quality Analyst

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

What You’ll Be Doing

Why This Role Matters

Similar Jobs

Field Test Engineer

Testing Analyst

Software Development Engineer in Test II

Sr. AI Quality Engineer

Manufacturing Quality Assurance Senior professional

QA Automation Engineer

Related Articles

Become an AI Developer: Your Career Guide

AI Replacing Software Engineers: 2026 Hiring Collapse