Responsibilities
- Ensure systems and services remain available, performant, and scalable by proactively monitoring, maintaining, and planning capacity.
- Respond to, analyze, and resolve system outages and disruptions while implementing preventive measures to avoid recurrence.
- Build automation tools and scripts to reduce manual tasks, enhance efficiency, and strengthen system resilience.
- Continuously monitor and optimize system performance and resource utilization, identifying and resolving bottlenecks.
- Partner with development teams to embed reliability, scalability, and performance practices into the software development lifecycle.
- Keep current with emerging technology trends and contribute to internal technical communities to promote engineering excellence.


