Glendale, CA, USA On-site

The Walt Disney Company is hiring a Senior Systems Reliability Engineer

About the Role

This role focuses on maintaining and improving the reliability of complex distributed systems by applying engineering principles to operations challenges, reducing operational toil, and enhancing system resilience.

Responsibilities

  • Design and implement scalable monitoring and alerting systems
  • Develop automation tools to reduce manual operational tasks
  • Respond to and resolve critical production incidents
  • Conduct root cause analysis for system outages
  • Collaborate with development teams to improve service reliability
  • Define and track key reliability metrics such as SLOs and SLIs
  • Participate in on-call rotations with rapid response expectations
  • Optimize system performance and availability
  • Implement and maintain disaster recovery procedures
  • Contribute to capacity planning and scalability assessments
  • Enforce best practices in configuration management
  • Integrate reliability into the software development lifecycle
  • Lead post-incident reviews and drive follow-up actions
  • Evaluate and adopt new technologies to improve system stability
  • Support deployment pipelines with reliability checks
  • Maintain documentation for system architecture and incident response
  • Promote a blameless incident culture
  • Work across time zones with global teams
  • Ensure compliance with security and operational standards
  • Drive adoption of observability practices across teams

Nice to Have

  • Master’s degree in a technical field
  • Experience supporting streaming media platforms
  • Knowledge of large-scale data processing systems
  • Background in security operations
  • Public speaking or conference presentation experience
  • Open source contributions in relevant domains
  • Experience with machine learning infrastructure
  • Familiarity with database reliability engineering

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model

Team

Part of a global technology organization supporting digital platforms

What We Do

  • We power digital experiences for millions of users worldwide through resilient, scalable infrastructure.
  • Our teams work on real-time systems that support content delivery, user engagement, and platform stability.

Why You’ll Love It

  • You’ll solve complex technical challenges at scale.
  • You’ll see the direct impact of your work on user experience and platform reliability.

Available for qualified candidates

Required Skills
Windows ServerLinuxKubernetesHelmGitLab CIGitAWSPythonPHPRubyCI/CDCloud Infrastructure
About company
The Walt Disney Company
The Walt Disney Company creates world-class experiences and entertainment.
All jobs at The Walt Disney Company Visit website
Job Details
Category other
Posted 6 months ago