Responsibilities
- Develop, maintain, and review Terraform modules to provision and manage cloud resources.
- Apply best practices for Terraform state management, module design, and testing.
- Leverage CDKTF (Terraform) or Terragrunt patterns where appropriate.
- Architect and operate services in Azure and AWS (compute, storage, networking, databases).
- Provision, configure, and manage Databricks workspaces, clusters and jobs to support data pipelines.
- Implement tagging, cost-optimisation, and security controls across both clouds and Databricks environments.
- Build, test, and maintain Azure DevOps Pipelines or GitHub Actions to automate infrastructure provisioning and application deployment.
- Manage source in Azure DevOps or GitHub repos, enforce branching strategies, pull-request workflows and policy-as-code checks.
- Partner with client and data engineering teams to integrate infrastructure changes safely into development workflows.
- Produce clear architecture diagrams, runbooks, and “how-to” guides for both technical and non-technical stakeholders.
- Write and maintain automation scripts and CLI tools in Python to streamline operational tasks.
- Contribute to internal SDKs or libraries for shared tooling and infrastructure utilities.
Requirements
- Experience: 3+ years in a DevOps or SRE role with production workloads in Azure and AWS.
- IaC Proficiency: Deep hands-on Terraform experience, including Terraform Cloud, Terragrunt, module design, and testing.
- CI/CD Expertise: Proven ability building and maintaining Azure DevOps Pipelines and managing code in either Azure DevOps or GitHub repos.
- Programming Skills: Strong Python scripting skills; able to write clean, testable, and reusable code.
- Data Platform Understanding: Familiarity with provisioning and managing Databricks workspaces, clusters, and jobs.
- Disaster Recovery: Experience designing DR plans, automated backups, and conducting restore drills.
- Cloud Services: Solid understanding of core Azure and AWS services (VMs, networking, storage, IAM, RDS/SQL).
- Problem-Solving: Excellent troubleshooting abilities and a calm, methodical approach to incident response.
- Communication: Clear written and verbal communication; adept at documenting complex systems simply.
Nice to Have
- Advanced IaC Tools: Experience with CDK for Terraform (CDKTF) or Terragrunt for composing infrastructure patterns.
- Languages: PowerShell, Bicep, Python, Terraform
- Security & Compliance: Familiarity with cloud security best practices (CIS benchmarks, Azure Policy, AWS IAM policies).
- Networking: In-depth knowledge of VPCs, VNets, load balancers, VPNs and cross-region connectivity.
- Serverless & Containers: Experience with serverless platforms (Azure Functions, AWS Lambda) and container registries.