Australia (remote) Remote (Global)

ClickHouse is hiring a Database Reliability Engineer - Core Team

Responsibilities

  • Continuously enhance the reliability and performance of the core database system.
  • Develop and improve metrics and alerts to identify and prevent production issues before they impact users.
  • Investigate common customer issues to find root causes and propose fixes, reports, and improvements.
  • Refine incident response processes and post-mortem analyses for outages, collaborating with support and cloud teams to inform affected users.
  • Plan, implement, and lead chaos initiatives across engineering teams based on internal priorities.
  • Oversee on-call processes to address performance and reliability issues, establishing best practices for issue resolution and minimizing user impact.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering.
  • Previous experience operating the core database system or other SQL databases in production.
  • Scripting experience with Shell or Python, and ability to read and understand C++ code.
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
  • Strong problem-solving skills and solid production debugging abilities.
  • Ability to thrive in a fast-paced environment as part of a global team, with a focus on business goals.
  • High level of responsibility, ownership, and accountability.
  • Excellent communication skills

Nice to Have

  • Excellent understanding of distributed database internals and SQL, particularly the core database system.

Work Arrangement

Remote (Worldwide)

Team

Site Reliability Engineering team in the core database system

Responsibilities

  • Continuously enhance the reliability and performance of the core database system.
  • Develop and improve metrics and alerts to identify and prevent production issues before they impact users.
  • Investigate common customer issues to find root causes and propose fixes, reports, and improvements.
  • Refine incident response processes and post-mortem analyses for outages, collaborating with support and cloud teams to inform affected users.
  • Plan, implement, and lead chaos initiatives across engineering teams based on internal priorities.
  • Oversee on-call processes to address performance and reliability issues, establishing best practices for issue resolution and minimizing user impact.

Required

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering.
  • Previous experience operating the core database system or other SQL databases in production.
  • Scripting experience with Shell or Python, and ability to read and understand C++ code.
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
  • Strong problem-solving skills and solid production debugging abilities.
  • Ability to thrive in a fast-paced environment as part of a global team, with a focus on business goals.
  • High level of responsibility, ownership, and accountability.
  • Excellent communication skills

Preferred

Excellent understanding of distributed database internals and SQL, particularly the core database system.

Required Skills
Reliability EngineeringQA or customer facing engineering.Shell or Pythonability to readunderstC++ ccloud computing platforms such as AWSAzureor Google Cloud Platform. Reliability EngineeringQA or customer facing engineering.Shell or Pythonability to readunderstC++ ccloud computing platforms such as AWSAzureor Google Cloud Platform.
About company
ClickHouse
ClickHouse is a private cloud company recognized on the 2025 Forbes Cloud 100 list. It leads the market in real-time analytics, data warehousing, observability, and AI workloads, serving over 2,000 customers.
All jobs at ClickHouse Visit website
Job Details
Department Core Team
Category infrastructure
Posted 14 days ago