Requirements

10+ years of software and infrastructure engineering experience, including significant experience operating infrastructure-as-code platforms in cloud-first organizations.
Experience designing and operating large-scale Kubernetes platforms and scaling compute services on Kubernetes; experience with related cloud-native technologies including ArgoCD, Argo Rollouts, Istio, etc.
Deep understanding of Kubernetes platform architecture and operations, including workload isolation, autoscaling, networking, service mesh management, ingress patterns, observability, upgrades, and multi-tenant cluster design.
Experience designing and maintaining CI/CD systems for both infrastructure-as-code deployments and application delivery workflows. (Terragrunt, Atlas, ArgoCD, Octopus Deploy, Travis CI, etc.)
Experience building scalable infrastructure-as-code platforms using Terraform and related tooling, including modular architectures, remote state management, policy enforcement, deployment orchestration, and reusable infrastructure patterns.
Experience with monitoring and observability tooling and practices (metrics, logs, traces) and their management at scale. Experience with major observability platforms such as Grafana, Datadog, Honeycomb, etc.
Comfortable implementing and securing services in Google Cloud Platform as infrastructure-as-code, including GCP Projects, VPC Networks, Google Kubernetes Engine, IAM Roles, Groups, policies, and secure networking patterns.
Experience designing secure-by-default infrastructure including least-privilege access controls, workload identity, network segmentation, secret management, auditability, and compliance-oriented platform controls.
Strong operational instincts and experience debugging complex distributed systems, leading incident response efforts, and improving reliability through automation and observability.
Experience balancing developer experience, platform governance, operational reliability, and organizational scalability in fast-growing engineering environments.
Experience with backend languages (e.g. Python, GoLang, Node, Rust).

Nice to Have

Up-to-date on industry best practices and tools, and enjoy learning new things.
Excited about being hands-on while also driving platform direction, architecture decisions, and operational maturity in a fast-moving and supportive environment.
Willing to pitch in wherever needed — as a fast-moving startup we need to do good work, quickly.
Demonstrates strong curiosity and a proactive interest in AI, actively exploring and applying emerging technologies.

Work Arrangement

Hybrid — San Francisco, New York, Pittsburgh

Additional Information

This role is approximately 80% infrastructure focused and 20% application software focused.
This role has a rotational on-call schedule.
You will have the opportunity to shape incident response practices, operational standards, and platform reliability strategy for the team and throughout the organization.

Abridge is hiring a Staff Platform Engineer

Requirements

Nice to Have

Work Arrangement

Additional Information