About the Role

This role involves designing and maintaining reliable systems, automating operational processes, and collaborating across teams to improve service resilience and incident response.

Responsibilities

Design and implement scalable infrastructure solutions
Monitor system performance and respond to incidents
Develop automation tools to reduce manual operations
Collaborate with development teams to enhance system reliability
Troubleshoot and resolve complex technical issues
Maintain system documentation and runbooks
Participate in on-call rotations for incident management
Optimize system availability and reduce downtime
Implement proactive alerting and monitoring systems
Support cloud infrastructure and migration initiatives
Enforce security and compliance standards
Drive post-incident reviews and follow-up actions
Improve deployment reliability and rollback procedures
Contribute to capacity planning and performance tuning
Promote best practices in system design and operations
Integrate reliability into the software development lifecycle
Use data to identify and resolve system bottlenecks
Manage configuration and change control processes
Support disaster recovery planning and testing
Ensure systems meet service level objectives
Work with cross-functional teams to resolve production issues
Evaluate new technologies for operational improvements
Mentor junior engineers and share technical knowledge
Maintain focus on customer impact during outages
Contribute to engineering standards and operational policies

Nice to Have

Master’s degree in a technical field
Certifications in cloud or systems engineering
Experience with large-scale enterprise systems
Background in financial or regulated industries
Knowledge of Kubernetes and service mesh technologies
Experience with infrastructure as code tools
Familiarity with observability platforms
Contributions to open-source projects
Public speaking or technical writing experience
Leadership in incident command roles

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexible location options

Team

Part of a global engineering team focused on system reliability and performance

Why This Role Matters

This position plays a critical role in maintaining the stability and performance of systems that support enterprise clients.
Engineers in this role directly influence uptime, scalability, and the overall customer experience.

What to Expect

You will work across time zones with global teams.
Expect a mix of strategic planning and hands-on technical problem solving.
Opportunities for professional growth and technical leadership are built into the role.

Available for qualified candidates

Ensono is hiring a Senior Site Reliability Engineer

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

Why This Role Matters

What to Expect

Similar Jobs

Database Engineer II

Senior DevOps Engineer

Senior DevOps Engineer

DevOps Lead - (LATAM CANDIDATES ONLY)

Sr Cloud Engineer | NodeJS + TS/JS | Europe remote

DevOps Azure Senior MS055SG

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

CI/CD Testing Tools: 23 Best Options for 2026