Florida, Uruguay Remote (Global)

Wikimedia Foundation is hiring a Senior Site Reliability Engineer

About the Role

We're looking for a Senior Site Reliability Engineer to help sustain and evolve the infrastructure behind one of the world's most visited platforms. You'll play a key role in ensuring reliability, performance, and scalability across a vast, distributed system used by millions daily. This position is central to maintaining operational excellence while advancing automation, observability, and resilience.

What You’ll Do

  • Manage and optimize production systems through deployment, configuration, and ongoing maintenance using modern DevOps practices
  • Design and implement automation for provisioning, scaling, and monitoring services using tools like Puppet and Kubernetes
  • Collaborate with engineering teams to shape scalable architectures and guide best practices in system design
  • Respond to incidents as part of a rotating on-call schedule, leading diagnosis, resolution, and post-mortem analysis to strengthen system resilience
  • Diagnose complex issues across layers—from network protocols to application performance—using deep knowledge of TCP/IP, HTTP, TLS, and DNS
  • Contribute to a culture of continuous improvement by identifying inefficiencies and driving automation initiatives
  • Mentor team members and share expertise across a globally distributed, asynchronous work environment
  • Occasionally travel 1–2 times per year for team gatherings and in-person collaboration

What We’re Looking For

  • At least six years of experience in site reliability, systems engineering, or DevOps roles within large-scale environments
  • Strong scripting ability in Python, Bash, or similar, with hands-on experience in configuration management (especially Puppet)
  • Proven skill in Linux system administration, particularly on Debian-based systems, including package management and kernel-level troubleshooting
  • Deep understanding of distributed systems, caching architectures, and performance optimization
  • Experience with incident response, root cause analysis, and implementing preventive measures
  • Excellent written and verbal communication skills in English, with the ability to work independently across time zones

Nice to Have

  • Background in tuning Linux kernels for high-throughput services
  • Familiarity with caching proxies such as Varnish, Nginx, or Envoy
  • Experience with monitoring and alerting stacks like Prometheus and Grafana
  • Contributions to open-source projects or active participation in developer communities
  • Knowledge of PHP, HHVM, Redis, or MediaWiki ecosystems
  • Experience defining and managing service-level objectives (SLOs) across teams

Our Environment

We operate as a remote-first organization with team members across more than 40 countries. All code, configuration, and documentation are publicly accessible, reflecting our commitment to open-source principles. Our culture values diversity, transparency, and continuous learning. We prioritize equitable compensation, inclusive hiring, and accessibility for all applicants and employees.

Required Skills
PythonGoBashRubyPuppetAnsibleKubernetesLinuxDebianTCP/IPHTTPTLSDNSDistributed CachingScripting PythonGoBashRubyPuppetAnsibleKubernetesLinuxDebianTCP/IPHTTPTLSDNSDistributed CachingScripting
Freelancing without stability?

Get steady projects, keep your freedom

Iglu connects you with international clients and handles contracts, payments, and admin. You get consistent work and flexibility — no more chasing invoices or worrying about gaps.

Consistent client projects
Contract & payment management
Flexible work schedule
Revenue-sharing compensation
See open positions
Work from anywhere
About company
Wikimedia Foundation
The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge.
All jobs at Wikimedia Foundation Visit website
Job Details
Category infrastructure
Posted a month ago