Responsibilities
- Design, provision, and manage infrastructure primarily on Google Cloud (GKE) and other cloud environments, with Kubernetes as the core platform abstraction. Work directly with engineering counterparts on the design and implementation of cloud-based infrastructure solutions.
- Build platform capabilities, internal developer portal (IDP) components, and self-service tooling that reduces friction for engineers and standardizes how services run in production.
- Design and build automations and infrastructure tooling with GitOps as a fundamental principle, integrating with APIs, cloud SDKs, and Kubernetes components.
- Create and manage CI/CD pipelines (e.g., ArgoCD, Argo Rollouts, Argo Workflows, GitHub Actions), ensuring safe, observable, and repeatable deployments across environments.
- Partner with engineering teams to define scalable infrastructure patterns, reusable modules, and cost-efficient solutions based on usage and growth requirements.
- Monitor infrastructure spend and contribute to cost analysis, optimization, and FinOps practices to improve cloud efficiency.
- Improve observability, alerting, SLOs, and runbooks to strengthen reliability and incident response.
- Implement cloud security and compliance best practices, including IAM design, workload identity, secrets management, and container security standards
Requirements
- 5+ years in DevOps, SRE, or Platform Engineering, with strong production experience in GCP and Kubernetes (GKE)
- Strong automation skills, programming and scripting in Python, including working with APIs, SDKs, and CLIs
- Proven experience operating Kubernetes in production, including Helm, ArgoCD, operators, Kustomize, networking, autoscaling, and RBAC
- Deep expertise in Terraform / OpenTofu and infrastructure-as-code best practices; experience designing reusable modules and managing multi-environment deployments
- Solid understanding of cost modeling, tagging strategies, and cloud cost optimization techniques (FinOps awareness a plus)
- Familiarity with microservices architectures, service discovery, and containerization workflows
- Strong communication skills and a habit of clear, useful documentation
Nice to Have
- Experience with additional cloud providers
- Terragrunt
- Hands-on experience supporting hybrid or multi-cloud deployments where appropriate
- A platform mindset: you’ve built tools, abstractions, or paved roads that improve developer velocity and system reliability
Benefits
- 100% remote within the US
- Flexible vacation policy
- Annual vacation allowance for travel related expenses
- Three-day weekend every month of the year
- Competitive compensation
- 100% healthcare coverage
- 401k plan
- Flexible Spending Account (FSA) for dependent, medical, and dental care
- Access to coaching, therapy, and professional development


