Responsibilities
- Contribute at tribe level to reliability, performance, and operability. Help build and run the reliability system: observability standards, incident response practices, runbooks, and automation that reduces toil and improves service health over time.
- Partner closely with Software Engineering, QA, Platform Engineering, and Senior/Lead SREs to embed reliability into delivery without becoming a bottleneck. Own well-scoped operational improvements end-to-end (design, implement, test, roll out, measure) and steadily increase your scope and independence.
- Use AI routinely to accelerate investigation, diagnostics, runbook creation, infrastructure automation, and operational reporting, while staying accountable for verification and safe operation. This role exists to make reliability measurable and repeatable, reduce operational toil through automation, and enable fast delivery without compromising safety, control, or customer trust.
Requirements
- Experience as an SRE / DevOps / Production Engineer (typically 2–5 years).
- Experience supporting cloud services and operational automation in production environments; Azure experience beneficial.
- Experience contributing to CI/CD, IaC, and observability practices in a delivery team.
- Strong academic background, including a degree in a STEM subject discipline, or equivalent experience.
- Uses AI to accelerate investigation, automation drafts, and runbook creation, and verifies outputs before use.
- Can follow and contribute to repeatable operational workflows and templates that improve reliability over time.
- Understands and mitigates AI risks in operations (unsafe actions, false confidence, confidentiality).
- Calm, pragmatic, and reliable; communicates clearly during incidents and operational issues.
- Outcome-focused with a bias for automation and systemic fixes over manual effort.
- Collaborative and receptive to feedback; grows quickly in a high-tempo environment.
- Customer-aware mindset suitable for regulated, mission-critical environments.
Nice to Have
- Experience supporting and improving production services with reliability and performance expectations.
- Working knowledge of cloud and cloud-native operations (Azure preferred), and the fundamentals of running services safely.
- Experience with IaC and automation (tooling/framework aligned to your stack), with good review and change discipline.
- Familiarity with CI/CD and deployment practices; able to improve pipelines and release safety under guidance.
- Practical observability skills: logs/metrics/traces, dashboards, and alert tuning.
- Comfortable scripting and automation (e.g., PowerShell, CLI tooling).
Additional Information
- Xceptor works with clients in financial services and our offers of employment are subject to the satisfactory completion of background checks, which includes criminal record checks, and credit reference checks.
- If you have any employment gaps exceeding three months within the last six years, we will request additional information and evidence to clarify those periods.


