Europe

Photoroom is hiring a Site Reliability Engineer

About the Role

The role involves building and maintaining reliable systems by combining software engineering and operational expertise to support scalable services.

Responsibilities

  • Design and manage scalable infrastructure for high availability
  • Implement automated deployment and rollback systems
  • Monitor system performance and proactively address issues
  • Respond to incidents and lead resolution efforts
  • Optimize system reliability and reduce downtime
  • Develop tools to streamline operations and reduce manual work
  • Collaborate with engineering teams to improve service resilience
  • Enforce observability standards across services
  • Manage on-call rotations and post-incident reviews
  • Improve CI/CD pipelines for faster, safer releases
  • Conduct capacity planning and resource forecasting
  • Support security and compliance requirements in infrastructure
  • Troubleshoot production issues across multiple layers
  • Drive adoption of best practices in reliability engineering
  • Contribute to disaster recovery planning and testing
  • Evaluate and integrate new technologies for operational efficiency
  • Maintain documentation for systems and procedures
  • Work closely with developers to refine service design
  • Ensure systems meet SLOs and error budget policies
  • Automate routine operational tasks
  • Analyze system metrics to identify trends and risks
  • Promote a blameless culture during incident investigations
  • Scale infrastructure in response to product growth
  • Integrate feedback loops for continuous improvement
  • Support cloud cost optimization initiatives

Nice to Have

  • Experience in fast-growing startups or high-traffic environments
  • Background in machine learning infrastructure
  • Familiarity with edge computing or CDN technologies
  • Contributions to open-source projects
  • Experience with real-time data processing systems

Compensation

Competitive salary and equity

Work Arrangement

Remote-first with team hubs

Team

Collaborative engineering team focused on infrastructure and product reliability

Our Stack

We use Kubernetes for orchestration, Terraform for infrastructure, Prometheus and Grafana for monitoring, and a mix of Python and Go for service development.

Impact

Your work will directly influence the stability and performance of a widely used visual media platform, enabling seamless user experiences at scale.

Available for qualified candidates

Required Skills
AWSKubernetesTerraformPrometheusGrafanaGoPythonCI/CDLinuxDatadogPostgreSQLRedisIncident Management
About company
Photoroom
Photoroom launched in 2020 after Y Combinator and is the world's most popular AI photo editor. The company's goal is to create technology allowing anyone to create studio-level product images in minutes. It serves both individual creators and major enterprises through B2C app and B2B API solutions, with over 300 million downloads and processing 5+ billion images annually. It is a profitable, remote-friendly company with Series B funding.
All jobs at Photoroom Visit website
Job Details
Category infrastructure
Posted 7 months ago