Eden Prairie, Minnesota, United States Hybrid USD 134,600 - 230,800 Yearly

Optum Tech (UnitedHealth Group) is hiring a Principal Site Reliability Engineer

Optum Tech, a part of UnitedHealth Group, is seeking a Principal Site Reliability Engineer (SRE) to lead the design and implementation of resilient, observable, and high-performing systems. This role is for a strategic thinker who thrives in complex environments and is passionate about reliability, automation, and innovation at the intersection of SRE and AI.

What You'll Do

  • Lead the implementation and standardization of OpenTelemetry across services to enhance observability and traceability
  • Define and enforce SLIs, SLOs, and error budgets in collaboration with engineering teams
  • Design and execute resiliency tests, disaster recovery (DR) exercises, and chaos engineering game days to proactively identify and mitigate system weaknesses
  • Develop automated failure injection and recovery validation tools
  • Enhance CI/CD pipelines with automated performance and load testing to ensure reliability and scalability before production deployment
  • Collaborate with DevOps and QA to integrate performance benchmarks into release gates
  • Drive cloud adoption strategies with a focus on resiliency patterns, multi-region failover, and cost-effective scaling
  • Partner with cloud architects to design fault-tolerant infrastructure and services
  • Explore and implement AI-driven solutions for anomaly detection, incident prediction, and intelligent alerting
  • Innovate with AI agents to automate routine SRE tasks and improve incident response efficiency
  • Serve as a thought leader and mentor for SRE best practices across the organization
  • Lead cross-functional initiatives to improve system reliability, developer productivity, and customer experience

What We're Looking For

  • 10+ years of experience in software engineering, DevOps, or SRE roles, with at least 3+ years in a principal or lead capacity
  • 5+ years of experience with CI/CD tooling (e.g., Jenkins, GitHub Actions, ArgoCD)
  • 5+ years of experience with container orchestration in cloud platforms (Azure or AWS preferred)
  • 3+ years of deep experience in observability and monitoring tools (e.g., OpenTelemetry, Prometheus, Grafana, Datadog)
  • 3+ years of experience with chaos engineering, DR planning, and performance testing

Nice to Have

  • Bachelor's degree in Computer Science, Information Technology or related field
  • Hands-on experience with infrastructure as code (Terraform, Pulumi) and automation tools such as Ansible, Helm
  • Experience with service mesh technologies (e.g., Istio, Linkerd)
  • Familiarity with AI/ML concepts and experience applying them in operational contexts
  • Proven excellent communication and leadership skills

Technical Stack

  • Observability: OpenTelemetry, Prometheus, Grafana, Datadog
  • CI/CD: Jenkins, GitHub Actions, ArgoCD
  • Cloud: Azure, AWS
  • Infrastructure as Code: Terraform, Pulumi, Ansible, Helm
  • Service Mesh: Istio, Linkerd

Benefits & Compensation

  • Compensation: $134,600 to $230,800 annually
  • Comprehensive benefits package
  • Incentive and recognition programs
  • Equity stock purchase
  • 401k contribution

Work Mode

This is a hybrid position open to candidates in the United States.

UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations.

Required Skills
OpenTelemetryPrometheusGrafanaDatadogJenkinsGitHub ActionsArgoCDAzureAWSTerraformCI/CDContainer OrchestrationObservabilityChaos EngineeringDisaster Recovery OpenTelemetryPrometheusGrafanaDatadogJenkinsGitHub ActionsArgoCDAzureAWSTerraformCI/CDContainer OrchestrationObservabilityChaos EngineeringDisaster Recovery
Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
Optum Tech (UnitedHealth Group)
Optum Tech is a global leader in health care innovation. Our teams develop cutting-edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care’s most complex challenges.
All jobs at Optum Tech (UnitedHealth Group) Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago