Responsibilities
- Develop software solutions to automate manual operational tasks across the lifecycle.
- Diagnose and resolve critical incidents, lead post-incident reviews without blame, and implement fixes to prevent recurrence.
- Work closely with development teams from design through deployment to ensure systems are built for reliability and scale.
- Analyze application behavior and metrics to define meaningful service level objectives.
- Create patterns that enable systems to self-recover and withstand failures.
- Build automated solutions for software updates, configuration changes, and product releases.
- Partner with senior engineers and provide guidance to less experienced team members.
- Architect, launch, and oversee cloud environments on AWS with an emphasis on automation, growth capacity, and protection.
- Implement and manage infrastructure using code with tools like Terraform.
- Continuously monitor system performance, uptime, and security, applying observability principles to enhance system health.


