Responsibilities
- Work closely with engineering teams to design, build, secure, and scale a SaaS platform on Azure using services such as AKS, Azure Functions, and Azure Service Bus.
- Lead the creation and refinement of CI/CD pipelines, infrastructure automation, and release workflows using Terraform, Helm, GitHub Actions, and Azure DevOps.
- Promote Site Reliability Engineering practices across teams with emphasis on observability, incident management, system availability, performance tracking, and operational rigor.
- Support FedRAMP compliance initiatives through secure configuration management, audit logging, documentation, and adherence to operational protocols.
- Deliver advanced operational support (Tier 2/3), including maintenance window coverage and participation in an on-call rotation during North American business hours.
- Assist in capacity planning, cost efficiency analysis, and scaling strategies for critical infrastructure components.
- Enhance monitoring systems, alerting mechanisms, and incident runbooks to ensure system reliability and fast response times.
- Engage in planning for security resilience, system availability, and disaster recovery preparedness.
- Document operational procedures, architectural choices, and response protocols using structured tracking systems like Jira and Confluence.
Work Arrangement
Remote (Worldwide)
Other
- Only US Citizens and US Permanent Residents are eligible for this role.
- Participation in an on-call rotation during North American business hours is mandatory.
- While remote work is supported globally, applicants must be US Citizens or Permanent Residents.


