Taguig City, Philippines, Philippines Remote (Global)

Acquireai is hiring a Site Reliability Engineer

Acquireai is looking for a Site Reliability Engineer to serve as the guardian of our production systems. You will ensure the reliability, scalability, and performance of our IoT telemetry platform by defining SLOs, automating operational processes, and building the infrastructure and tooling that enables our engineering teams to deploy with confidence.

What You'll Do

  • Define, monitor, and enforce Service Level Objectives (SLOs) and error budgets across all production systems.
  • Track error budget burn rates and make data-driven decisions to halt risky deployments when thresholds are exceeded.
  • Implement comprehensive monitoring and alerting strategies using Prometheus, Grafana, and PagerDuty.
  • Design and implement Infrastructure as Code (IaC) solutions using Pulumi with TypeScript.
  • Manage and optimize AWS services including EKS (Elastic Kubernetes Service), MSK (Managed Streaming for Kafka), SingleStore, MongoDB, and S3.
  • Automate operational processes to eliminate toil, targeting any task that consumes more than 2 engineer-days per quarter.
  • Serve as incident commander during production outages and service degradations.
  • Lead comprehensive post-mortem processes within 48 hours of incidents and drive 'never-again' corrective actions to completion.
  • Maintain and improve incident response procedures and runbooks.
  • Implement and enforce least-privilege IAM policies across all AWS resources.
  • Manage security patch pipelines and vulnerability remediation processes.
  • Support compliance initiatives including SOC2 and ISO 27001 certification requirements.
  • Participate in follow-the-sun on-call rotation with one week primary/secondary commitment every five weeks.
  • Provide 24×7 support coverage across AU/NZ, EU/ZA, and MX time zones.
  • Maintain operational runbooks and knowledge transfer documentation.

What We're Looking For

  • Proven experience defining and enforcing Service Level Objectives (SLOs) and error budgets in a production environment.
  • Deep hands-on experience with monitoring and alerting tools like Prometheus and Grafana.
  • Expertise in Infrastructure as Code using Pulumi, Terraform, or similar tools.
  • Strong experience managing and optimizing AWS services, particularly EKS and MSK.
  • Proficiency in a scripting or programming language such as TypeScript, Python, or Go.
  • Experience automating operational workflows and eliminating manual toil.
  • Demonstrated ability to lead incident response and post-mortem processes.
  • Strong knowledge of cloud security best practices, including IAM policy management and vulnerability remediation.
  • Experience supporting SOC2, ISO 27001, or similar compliance frameworks.
  • Willingness to participate in a global on-call rotation.

Technical Stack

  • Monitoring & Alerting: Prometheus, Grafana, PagerDuty
  • Infrastructure as Code: Pulumi, TypeScript
  • Cloud Platform: AWS
  • Core Services: EKS (Elastic Kubernetes Service), MSK (Managed Streaming for Kafka), SingleStore, MongoDB, S3

Work Mode

This is a global, remote position. Candidates should be located in and authorized to work in the AU/NZ, EU/ZA, or MX time zones to support our follow-the-sun on-call model.

Acquireai is an equal opportunity employer.

Required Skills
PrometheusGrafanaPagerDutyPulumiTypeScriptAWSEKSMSKSingleStoreMongoDBKubernetesInfrastructure as CodeMonitoringAlertingCloud Architecture PrometheusGrafanaPagerDutyPulumiTypeScriptAWSEKSMSKSingleStoreMongoDBKubernetesInfrastructure as CodeMonitoringAlertingCloud Architecture
Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
Acquireai
We’re an award-winning global outsourcer providing contact center and back office services on behalf of our global clients.
All jobs at Acquireai Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago