As a Site Reliability Engineer III, you will play a central role in ensuring the resilience and performance of critical systems. You'll design and maintain scalable infrastructure using cloud technologies, focusing on automation, reliability, and proactive problem resolution. Your work will directly influence system uptime, incident management, and long-term engineering standards.
Key Responsibilities
- Lead the design and implementation of reliable, scalable systems using code-driven infrastructure and configuration management
- Collaborate with engineering teams to integrate continuous integration and delivery pipelines into development workflows
- Define and monitor service level objectives and indicators to detect and resolve issues before user impact
- Drive incident response for major outages, including root cause analysis and follow-up through blameless postmortems
- Advocate for and implement site reliability best practices across teams and platforms
- Work closely with technical stakeholders to troubleshoot complex system behaviors across network, compute, and application layers
- Apply chaos engineering principles and disaster recovery strategies to strengthen system resilience
Qualifications
You bring a foundation in software engineering and systems reliability, with hands-on experience in cloud environments and automated operations. You are skilled in writing code to manage infrastructure and improve system observability.
- 3+ years of experience in software engineering, infrastructure support, or site reliability roles
- Proficiency in at least one programming language such as Python or Java/Spring Boot
- Experience with observability tools including Grafana, Prometheus, Datadog, Splunk, or Dynatrace
- Familiarity with CI/CD platforms like Jenkins or GitLab
- Working knowledge of containerization and orchestration with Docker, Kubernetes, or ECS
- Understanding of service level management, networking fundamentals, and infrastructure-as-code tools like Terraform
- Ability to communicate technical concepts clearly and lead initiatives with minimal oversight
Preferred Experience
- Direct experience with AWS, Azure, or GCP cloud platforms
- Hands-on work with GitHub and collaborative code review processes
- Experience automating operations using Python or similar scripting languages
- Familiarity with DevOps tooling such as Jira, Confluence, ServiceNow, or Netcool
- Leadership or mentoring experience in SRE or DevOps environments
Work Environment
This role operates in a hybrid model, combining office and remote work to support collaboration and flexibility. The organization values inclusive practices and provides accommodations to support diverse needs.
Compensation & Benefits
Base salary is determined by role scope, experience, skills, and location. Eligible positions may include discretionary incentive compensation in the form of cash or forfeitable equity. The benefits package includes comprehensive health coverage, retirement planning, mental health resources, tuition reimbursement, backup childcare, and financial coaching.


