Requirements
- 4+ years of professional experience in Machine Learning Engineering, Applied ML, Software Engineering (ML-focused), or related roles
- Strong proficiency in Python, with experience writing production-quality code and working with ML libraries (e.g., PyTorch, TensorFlow, scikit-learn)
- Experience training, evaluating, and iterating on ML models, with an emphasis on diagnosing failure modes rather than just optimizing metrics
- Strong understanding of ML evaluation: metrics design, test coverage, error analysis, and tradeoffs between correctness, robustness, and generalization
- Ability to debug complex ML system failures, including issues caused by data, evaluation artifacts, or underspecified requirements
- Comfort working with incomplete specifications and multiple valid solutions, especially in open-ended or real-world tasks
- Experience working with ML pipelines or systems, including training workflows, evaluation harnesses, or model-in-the-loop systems
Nice to Have
- Experience building or maintaining ML training and evaluation pipelines
- Familiarity with ML infra concepts (e.g., reproducibility, experiment tracking, model versioning)
- Experience working with tools-on environments (e.g., programmatic evaluation, scripting, notebooks, or terminal-driven workflows)
- Exposure to LLM systems, including model evaluation, benchmarking, prompt or agent behavior analysis
- Experience reasoning about multiple valid implementations and tradeoffs in engineering solutions
- Strong written communication skills for explaining system behavior, failures, and engineering decisions
Work Arrangement
Remote (Worldwide)
Additional Information
- Flexible hours with a minimum commitment of 20+ hours per week
- Project length 1–2 months, with potential to extend


