US, Canada, Europe Remote (Global)

AuthZed is hiring an Engineering Manager: SRE

AuthZed is looking for an Engineering Manager: SRE to lead the team responsible for the reliability, scalability, and performance of our infrastructure as we grow globally. This is a hands-on leadership role where you will manage and develop a team of SREs while staying deeply engaged with production systems, incident response, and platform architecture.

What You'll Do

  • Lead a global team of Site Reliability Engineers delivering infrastructure automation, observability, and operational scalability across multi-cloud, multi-region Kubernetes architectures.
  • Recruit, hire, onboard, and develop engineers while elevating the overall strength of the team.
  • Act as a player-coach by contributing to critical projects while mentoring engineers and supporting their professional growth.
  • Participate in on-call rotations at a sustainable level to stay grounded in real operational issues.
  • Guide project planning by defining milestones, identifying dependencies, and working toward timely delivery.
  • Identify toil and lead initiatives to eliminate it through engineering solutions.
  • Drive automation and platform engineering: safer deploys, progressive delivery, guardrails, and paved paths that reduce toil.
  • Collaborate with product and engineering to ship features like self-service workflows and infra-as-code expectations with reliability baked in.
  • Serve as a senior escalation point for complex incident triage and root cause analysis.

What We're Looking For

  • 10+ years of experience in infrastructure, SRE, or platform engineering roles.
  • 5+ years of team management or technical leadership in SRE or Platform Engineering.
  • Experience managing distributed teams across US, Canada, EU, and global time zones.
  • Experience leading or mentoring SRE/Infrastructure/Platform teams in a production SaaS environment.
  • Strong leadership skills with the ability to mentor and coach senior-level engineers.
  • Strong grasp of SRE fundamentals: SLOs/SLIs, error budgets, incident management, capacity planning, and operational excellence.
  • Extensive experience with AWS, GCP and Azure managed services.
  • Strong programming skills and experience writing production-quality automation or tooling (e.g., Go, Python, Bash).
  • Hands-on experience with Kubernetes, Kubernetes Operators/Controllers, containerized workloads, and Infrastructure as Code (Terraform, Pulumi).
  • Experience with monitoring and observability systems (e.g., Prometheus, Grafana, logging/tracing pipelines).
  • Excellent communication: can translate reliability tradeoffs to product/exec stakeholders and write crisp incident/postmortem artifacts.
  • Proven ability to translate operational pain points into engineering deliverables.

Nice to Have

  • Experience working with or integrating AI-powered systems or tooling.
  • Experience operating multi-tenant or high-isolation customer environments.
  • Familiarity with distributed databases and performance tuning at scale.
  • Experience building internal developer platforms or paved paths.

Technical Stack

  • Cloud: AWS, GCP, Azure
  • Infrastructure: Kubernetes, Terraform, Pulumi
  • Observability: Prometheus, Grafana
  • Programming: Go, Python, Bash

Benefits & Compensation

  • Competitive salary based on experience.
  • Stock options at an early-stage startup.
  • Comprehensive benefits including healthcare (US-based) and other insurance.
  • Twice-yearly travel for team offsites focused on team bonding, collaboration, and having fun.

Work Mode

This is a global remote role open to candidates in the US, Canada, and Europe.

AuthZed celebrates the representation of diverse perspectives and backgrounds as a catalyst for creating an inclusive work environment.

Required Skills
AWSGCPAzureKubernetesTerraformPulumiPrometheusGrafanaGoPythonSRESite Reliability EngineeringInfrastructure as CodeMonitoringDistributed Systems AWSGCPAzureKubernetesTerraformPulumiPrometheusGrafanaGoPythonSRESite Reliability EngineeringInfrastructure as CodeMonitoringDistributed Systems
Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation
Project choice & autonomy
International client base
Career growth support
Check compensation
Top earners exceed market rate
About company
AuthZed
We are the creators and maintainers of SpiceDB and the authorization infrastructure that companies around the world depend on. We are a Series A company, fixing broken access control with products that eliminate complex permission management while delivering enterprise-scale performance and consistent access control.
All jobs at AuthZed Visit website
Job Details
Category infrastructure
Posted 3 months ago