London or United Kingdom

Xceptor is hiring a Site Reliability Engineer

Xceptor is hiring a Site Reliability Engineer to join a cross-cutting function that partners with tribes across the company to make services reliable, performant, secure, and operable in production. This is an AI-first role where you will use AI routinely to accelerate investigation, diagnostics, runbook creation, and automation, while embedding reliability into the delivery process from the start.

What You'll Do

  • Contribute at the tribe level to service reliability, performance, and operability.
  • Help build and run the reliability system: observability standards, incident response practices, runbooks, and automation.
  • Partner closely with Software Engineering, QA, Platform Engineering, and Senior/Lead SREs.
  • Own well-scoped operational improvements end-to-end, from design and implementation through testing, rollout, and measurement.
  • Contribute to defining and improving SLIs/SLOs and service health signals, aligned to customer outcomes.
  • Implement reliability improvements within established patterns like timeouts, retries, graceful degradation, and safe failure modes.
  • Support capacity and performance work, including basic baselining, load investigation, and scaling hygiene.
  • Help maintain operational quality across production and staging environments and improve environment consistency.
  • Participate in incident response and on-call rotations, contributing to triage, mitigation, and recovery.
  • Produce clear post-incident notes and support root cause analysis, focusing on actions that prevent recurrence.
  • Create and improve runbooks and playbooks so incidents are faster and more consistent to resolve.
  • Help improve change safety through practical release/readiness checks and operational guardrails.
  • Implement and improve observability for services: logs, metrics, traces, dashboards, and alerting aligned to standards.
  • Tune alerts to reduce noise and improve actionability; help manage flakiness and false positives.
  • Build and maintain service health dashboards that support quick diagnosis and release confidence.
  • Work with QA and Engineering to align operational signals with end-to-end journey health.
  • Automate repetitive operational tasks and reduce toil through scripts, tooling, and pipeline improvements.
  • Contribute to deployment automation and reliability guardrails in CI/CD, working with Platform Engineering.

Team & Environment

You will be part of a cross-cutting function that partners with tribes across Xceptor, embedding reliability practices directly into their workflows and systems.

Xceptor fosters a company culture built on Client Centricity, One Team, and Impactful work.

Required Skills
AWSAzureKubernetesDockerTerraformCI/CDGitLabGitHub ActionsPrometheusGrafanaPythonBashLinuxNetworkingSecurity
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Xceptor
Xceptor is a company that designs around data manipulation, sourcing data from wherever it flows, then curating, normalising, validating, repairing, and enriching that data so it reaches its destination in a reliable and consistent format. It is an expert in the Financial Services vertical, enabling business users to solve their data challenges by themselves.
All jobs at Xceptor Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago