Responsibilities
- Architect and lead delivery of cross-product GenAI platform capabilities: LLM Proxy, model registry integrations, vendor abstraction, and cost/usage attribution.
- Own the design and scaling of evaluation and benchmarking frameworks (A/B, offline, continuous regression tests) used to gate model releases.
- Define company-wide standards for safety, tone, and reasoning evaluation; drive adoption of evaluation rubrics and automated checks.
- Identify systemic failure modes across products and model families; prioritize mitigations, monitoring, and retraining strategies in partnership with ML teams.
- Drive platform reliability, observability, and capacity planning for LLM services; implement rate limiting, throttling, and SLA practices.
- Lead efforts to enable agentic workflows and safe tool use, defining integration patterns and security boundaries.
- Partner with engineering leadership, product, research, and legal/policy teams to translate risk, cost, and quality tradeoffs into platform design decisions.
- Mentor senior engineers, coordinate cross-team roadmaps, and represent the platform in technical forums.
Requirements
- BS in Computer Science, Engineering, or related field, or equivalent practical experience.
- 8+ years industry experience in backend, platform, or ML infrastructure engineering with major production responsibilities.
- Demonstrable experience with cloud-native infrastructure (Kubernetes, AWS/GCP/Azure) and production ML/LLM systems.
- Strong track record of building evaluation and monitoring for ML systems.
Nice to Have
- Experience building model registries, feature stores, or inference platforms at scale.
- Background in agentic AI frameworks, workflow orchestration, or tool-using models.
- Prior experience influencing company-wide ML safety, trust, or quality frameworks.
- Advanced degree (MS/PhD) in ML/NLP or related field and/or published research in relevant areas.


