About the Role
The role involves developing core infrastructure that powers machine learning applications across services, enabling teams to train, deploy, and manage models efficiently at scale.
Responsibilities
- Design and build scalable infrastructure for machine learning workflows
- Collaborate with data scientists and researchers to operationalize models
- Optimize training and inference pipelines for performance and cost
- Develop tools to automate deployment, monitoring, and scaling of ML systems
- Ensure platform reliability, security, and compliance across environments
- Integrate new hardware and distributed computing technologies into the platform
- Support versioning, reproducibility, and experiment tracking for ML workflows
- Work closely with product teams to understand requirements and deliver solutions
- Improve data ingestion and processing frameworks for model training
- Contribute to architectural decisions for cloud-native and on-premise systems
- Maintain documentation and best practices for platform usage
- Troubleshoot and resolve issues in production ML environments
- Evaluate and adopt open-source and internal ML tools
- Drive improvements in observability and debugging capabilities
- Support model governance, including lineage and auditability
- Help define standards for model performance and quality assurance
- Participate in code reviews and system design discussions
- Mentor junior engineers and promote technical excellence
- Stay current with advancements in ML infrastructure and distributed systems
- Contribute to long-term roadmap planning for platform evolution
Nice to Have
- Master’s or PhD in computer science or related field
- Direct experience scaling ML platforms in high-traffic environments
- Deep knowledge of Kubernetes and cloud-native architectures
- Hands-on experience with GPU-accelerated computing
- Contributions to open-source ML or infrastructure projects
- Experience with MLOps tools and platforms
- Background in systems performance tuning and resource optimization
- Familiarity with security practices in ML systems
- Prior work in gaming, media, or interactive entertainment
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid
Team
Part of a dedicated platform engineering team focused on machine learning systems within a global interactive technology environment
About the Team
- This group builds foundational systems that support machine learning initiatives across the organization, focusing on scalability, automation, and developer experience.
- Engineers work on challenges involving distributed training, real-time inference, and integration with diverse data sources and applications.
What We Value
- Technical rigor and attention to detail
- Collaborative problem solving
- Ownership of system performance and reliability
- Continuous learning and knowledge sharing
Available for qualified candidates


