About the Role
We are searching for a Site Reliability Engineer (SRE) who demonstrates dedication to sustaining dependable cloud-based infrastructure. Your responsibility will involve supporting Red Hat's software production services across our hybrid cloud environment. You'll collaborate closely with development, quality engineering, and release engineering teams to maintain the health of our service infrastructure. Your daily activities will encompass establishing service monitoring, enhancing automation capabilities, implementing security protocols, and addressing diverse service challenges. Engaging with professional communities, you'll help shape our hybrid cloud platform's design and share accountability for defining and tracking Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
In this position, you'll be required to promptly respond during service disruptions and participate in learning sessions focused on improving service resilience. Join our team committed to developing world-class open source software.
What You'll Accomplish
- Operate within a globally distributed team providing continuous support through strategic time zone coverage and structured on-call rotations
- Address service incidents using established procedures, investigate disruption origins, and coordinate resolution across multiple service teams
- Participate in incident review processes and implement corrective measures
- Manage and configure service infrastructure
- Proactively reduce operational complexity by automating repetitive and error-prone processes
- Synchronize efforts with various Red Hat technical teams to ensure cloud deployment meets rigorous quality standards
- Develop comprehensive monitoring, alert, and escalation strategies for infrastructure performance and availability issues
- Collaborate with service owners to establish, implement, and maintain precise SLIs and SLOs
Required Qualifications
- Proficiency in OpenShift administration
- Advanced Linux administration skills
- Foundational understanding of AWS technologies
- Experience with CI/CD platforms like Tekton, Pipelines as Code, potentially GitHub Actions or Jenkins
- Expertise in automation tools such as Ansible or Terraform
- Familiarity with open source monitoring technologies (Grafana, Prometheus, OpenTelemetry)
- Exceptional English communication capabilities for effective global team collaboration
Advantageous Skills
- Prior experience with SRE methodological approaches
- Software development background in Python or GoLang
#LI-EK1
About Red Hat
Red Hat is a global leader in enterprise open source software solutions, leveraging community-driven approaches to deliver cutting-edge Linux, cloud, container, and Kubernetes technologies. Spanning 40+ countries, our workforce operates flexibly across various work environments. We cultivate an inclusive culture where innovative ideas are welcomed regardless of an individual's role or tenure.
Inclusion at Red Hat
Our organizational culture embodies open source principles of transparency, collaboration, and inclusivity. We believe transformative ideas can emerge from anywhere, empowering diverse perspectives to challenge conventions and drive innovation. We are committed to providing equal opportunities and celebrating every individual's unique contributions.
Equal Opportunity Policy
Red Hat is an equal opportunity employer, evaluating candidates without discrimination based on race, color, religion, gender, sexual orientation, national origin, age, veteran status, disability, or any legally protected characteristic.
Remote (Global)
Red Hat is hiring a Site Reliability Engineer
Starting a business in Thailand?
Company registration done right
Foreign ownership rules, licenses, tax registration — Thai business setup has many moving parts. SVBL guides you through every step with full legal compliance.
Company registration & structure
Foreign ownership solutions
License & tax registration
BOI promotion eligibility
100% foreign ownership possible