About the Role

This role focuses on maintaining and improving the reliability of complex distributed systems by applying engineering principles to operations challenges, reducing operational toil, and enhancing system resilience.

Responsibilities

Design and implement scalable monitoring and alerting systems
Develop automation tools to reduce manual operational tasks
Respond to and resolve critical production incidents
Conduct root cause analysis for system outages
Collaborate with development teams to improve service reliability
Define and track key reliability metrics such as SLOs and SLIs
Participate in on-call rotations with rapid response expectations
Optimize system performance and availability
Implement and maintain disaster recovery procedures
Contribute to capacity planning and scalability assessments
Enforce best practices in configuration management
Integrate reliability into the software development lifecycle
Lead post-incident reviews and drive follow-up actions
Evaluate and adopt new technologies to improve system stability
Support deployment pipelines with reliability checks
Maintain documentation for system architecture and incident response
Promote a blameless incident culture
Work across time zones with global teams
Ensure compliance with security and operational standards
Drive adoption of observability practices across teams

Nice to Have

Master’s degree in a technical field
Experience supporting streaming media platforms
Knowledge of large-scale data processing systems
Background in security operations
Public speaking or conference presentation experience
Open source contributions in relevant domains
Experience with machine learning infrastructure
Familiarity with database reliability engineering

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model

Team

Part of a global technology organization supporting digital platforms

What We Do

We power digital experiences for millions of users worldwide through resilient, scalable infrastructure.
Our teams work on real-time systems that support content delivery, user engagement, and platform stability.

Why You’ll Love It

You’ll solve complex technical challenges at scale.
You’ll see the direct impact of your work on user experience and platform reliability.

Available for qualified candidates

The Walt Disney Company is hiring a Senior Systems Reliability Engineer

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

What We Do

Why You’ll Love It

Similar Jobs

Sr. Devops EngineerMexico City Mexico

Senior DevOps Engineer

DevOps & Solution Architect

DevOps Engineer

KTO - Platform Engineer - SRE - Lever

Senior Infrastructure Engineer /DevOps (relocation)

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026