Ottawa, Ontario, Canada Remote (City) CAD 129,500 - 170,100 Yearly

Ericsson is hiring a Site Reliability Engineer

Ericsson is looking for a Senior Site Reliability Engineer to champion the reliability, availability, performance, and scalability of our mission-critical services. In this senior role, you will partner with development and operations teams to guide system design and provide leadership in incident response.

What You'll Do

  • Serve as a technical leader ensuring production service reliability, scalability, and performance.
  • Collaborate with development teams to embed operability and automation into system architecture.
  • Lead high-severity incident response, driving resolution and coordinating stakeholder communications.
  • Champion root cause analysis and postmortems; ensure remediation is implemented and verified.
  • Design and maintain sophisticated monitoring, alerting, deployment, and infrastructure automation systems.
  • Oversee creation and regular review of operational runbooks/playbooks; lead resilience and chaos testing exercises.
  • Drive service lifecycle processes, including operational readiness, onboarding, and decommissioning.

What We're Looking For

  • B.Sc., M.Sc., degree in a relevant area, or equivalent experience.
  • 7–10+ years in systems engineering, DevOps, or SRE roles, with at least 3 years in a senior/lead capacity driving reliability initiatives.
  • Expert knowledge of SRE principles: SLIs, SLOs, error budgets, and reliability engineering methodologies.
  • Advanced Linux systems administration and troubleshooting skills, spanning cloud (AWS/Azure/GCP) and on-premises environments.
  • Extensive production experience with Kubernetes and container ecosystems (Docker, CRI).
  • Proficiency with Infrastructure as Code (Terraform, CloudFormation, Ansible) and automation scripting (Python, Go, Bash).
  • Strong background in designing/operating CI/CD pipelines, automated deployments, and rollout strategies (canary, blue-green).
  • Expertise with observability tools such as Prometheus, Grafana, ELK/EFK, Splunk, plus distributed tracing frameworks (Jaeger, Zipkin, OpenTelemetry).
  • Solid networking skills (TCP/IP, routing, load balancing) and security best practices (TLS, identity, secrets management).
  • Demonstrated thought leadership in designing and operating complex distributed systems.
  • Proven ability in capacity planning, performance tuning, profiling, and cost optimization at scale.
  • Understanding of telecom architectures (IMS, 4G/5G core concepts) and carrier-grade availability standards.
  • Command operational excellence during incidents, coordinating cross-team responses in high-pressure situations.
  • Lead structured problem-solving for deep root cause analysis with actionable follow-through.
  • Establish operational standards, best practices, and governance for reliability engineering across teams.
  • Exceptional communication to bridge technical and business contexts, influencing senior stakeholders.
  • Mentorship and coaching for junior and mid-level engineers; fostering a culture of reliability-first thinking.
  • Strategic decision-making under pressure, balancing innovation with risk management.
  • Initiative to identify systemic risks and champion enterprise-grade improvements.

Nice to Have

  • Experience with OSS/BSS, network management tooling, and telecom protocols.
  • Knowledge of regulatory/compliance constraints in telecom deployments.
  • Reliability-first, automation-first, and risk-aware approach; skilled at balancing speed and safety in delivery.
  • Advanced cloud or Kubernetes certifications (AWS Professional, Azure Expert, GCP Professional, CKA/CKAD) beneficial.
  • SRE leadership training, incident response, or chaos engineering certifications preferred.

Technical Stack

  • Operating Systems: Linux
  • Cloud: AWS, Azure, GCP
  • Containers & Orchestration: Kubernetes, Docker, CRI
  • Infrastructure as Code: Terraform, CloudFormation, Ansible
  • Scripting & Languages: Python, Go, Bash
  • Observability: Prometheus, Grafana, ELK/EFK, Splunk, Jaeger, Zipkin, OpenTelemetry

Work Mode

This is a local position based in Ottawa, Canada.

Ericsson is proud to be an Equal Opportunity employer.

Required Skills
LinuxAWSAzureGCPKubernetesDockerCRITerraformCloudFormationAnsibleSRESLIsSLOsDevOpsSystems Engineering LinuxAWSAzureGCPKubernetesDockerCRITerraformCloudFormationAnsibleSRESLIsSLOsDevOpsSystems Engineering
Starting a business in Thailand?

Company registration done right

Foreign ownership rules, licenses, tax registration — Thai business setup has many moving parts. SVBL guides you through every step with full legal compliance.

Company registration & structure
Foreign ownership solutions
License & tax registration
BOI promotion eligibility
Start your business
100% foreign ownership possible
About company
Ericsson
Ericsson builds advanced telecommunications solutions and networks, enabling connectivity and innovation across industries. The company focuses on developing next-generation technologies including 5G, cloud infrastructure, and AI-driven network services.
All jobs at Ericsson Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago