France or Germany or Spain or United Kingdom Remote (Global)

Upsun (formerly Platform.sh) is hiring a Site Reliability Engineer

Platform.sh is seeking a Site Reliability Engineer to join our Upsun team. As a key addition, you will help transition from traditional Cloud Operations to an automation-driven SRE model. Your focus will be on improving infrastructure, automating operational tasks, and streamlining processes to enhance system reliability, scalability, and efficiency.

What You'll Do

  • Refine monitoring and observability using tools like Prometheus, Grafana, and ELK Stack to ensure system visibility aligns with business objectives.
  • Automate deployments and workflows by transitioning manual processes to automated solutions with IaC tools like Terraform and Ansible.
  • Optimize CI/CD pipelines to improve architecture for fast, reliable, and scalable releases.
  • Manage and scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt.
  • Support incident response and lead post-mortem analysis to ensure continuous improvement and knowledge sharing.
  • Collaborate with cross-functional engineering and product teams to integrate reliability practices into the development lifecycle.
  • Drive technical innovation by introducing new tools, technologies, and practices that improve system reliability, performance, and scalability.

What We're Looking For

  • A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
  • Advanced hands-on experience with Linux internals, including performance tuning, kernel configurations, and troubleshooting.
  • Proficiency in programming languages such as Go (preferred) or Python for building tools and automating processes.
  • Strong skills in scripting languages like Python, Bash, or Go to automate workflows and manage infrastructure.
  • Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
  • Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.

Nice to Have

  • Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications.

Technical Stack

  • Monitoring/Observability: Prometheus, Grafana, ELK Stack
  • Infrastructure as Code: Terraform, Ansible
  • Cloud Platforms: AWS, GCP, Azure
  • Languages: Go, Python, Bash
  • Containerization: Docker, Kubernetes

Team & Environment

You will report to the Director, Site Reliability Engineering.

Benefits & Compensation

  • Flexible PTO
  • Comprehensive healthcare coverage (UK, France, Spain)
  • Company stock options
  • Professional development budget
  • Office equipment budget
  • Wellness budget
  • Annual team gatherings
  • Internet reimbursement
  • Inclusive parental leave
  • Remote work travel program

Work Mode

This is a global remote position open to candidates in France, Germany, Spain, and the United Kingdom.

Platform.sh is an equal opportunity employer.

Required Skills
PrometheusGrafanaELK StackTerraformAnsibleAWSGCPAzureGoPythonLinuxDevOpsSRE
Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
Upsun (formerly Platform.sh)
Upsun is the cloud application platform built for hybrid teams where AI agents write and test code and humans focus on solving problems. Developers, DevOps engineers, and platform teams use Upsun to build, ship, and scale confidently without wrestling with backend infrastructure.
All jobs at Upsun (formerly Platform.sh) Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago