About the Role
The role involves improving platform stability, scaling infrastructure efficiently, and ensuring high availability through proactive monitoring and incident response.
Responsibilities
- Design and maintain CI/CD pipelines for rapid and reliable software delivery
- Implement infrastructure as code using modern configuration tools
- Monitor system performance and respond to alerts with clear resolution paths
- Optimize cloud resource usage for cost and performance efficiency
- Troubleshoot complex production issues across distributed systems
- Enforce security best practices in deployment and access management
- Collaborate with development teams to improve code deployability
- Lead incident response efforts and conduct post-mortem analyses
- Maintain documentation for systems and operational procedures
- Evaluate and integrate new technologies to improve system resilience
- Support compliance and audit requirements for production systems
- Automate repetitive operational tasks to reduce manual intervention
- Ensure disaster recovery plans are tested and up to date
- Scale infrastructure to meet growing service demands
- Promote observability through logging, metrics, and tracing
- Manage containerized workloads in Kubernetes environments
- Work with monitoring tools to detect issues proactively
- Improve system uptime and reduce mean time to recovery
- Participate in on-call rotations for critical systems
- Drive improvements in system design based on operational feedback
- Ensure service level objectives are defined and tracked
- Support database performance and availability initiatives
- Integrate feedback loops into development workflows
- Maintain network and application security configurations
- Contribute to capacity planning and system architecture reviews
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid work model with flexible remote options
Team
Collaborative engineering team focused on scalable systems
Tech Stack
Uses Kubernetes for container orchestration, Terraform for infrastructure provisioning, Prometheus and Grafana for monitoring, and GitLab CI for pipeline automation.
Growth Opportunities
Engineers are encouraged to lead initiatives, mentor peers, and contribute to architectural decisions.
Available for qualified candidates

