Responsibilities
- Lead the complete machine learning lifecycle, from exploratory research and system design through deployment, monitoring, and ongoing model retraining
- Develop and apply comprehensive frameworks for training, evaluating, stress-testing, and monitoring ML/AI systems to ensure sustained performance and prevent degradation over time
- Solve complex scalability and performance issues to maintain efficiency, resilience, and consistent quality and latency under real-world production demands
- Guide technical strategy as a lead individual contributor and provide mentorship to less experienced engineering team members
- Support enhancements to automated testing practices across unit, integration, and functional testing levels
- Take full ownership of services and system components, including participation in on-call rotations and incident response