NVIDIA is hiring a Senior Site Reliability Engineer, Cloud

About the Role

This role involves designing and maintaining highly available cloud services through automation, monitoring, and incident response, ensuring systems scale efficiently and remain stable under load.

Responsibilities

  • Design and implement scalable infrastructure for cloud platforms
  • Develop automation tools to streamline operations and reduce manual intervention
  • Monitor system performance and proactively address potential issues
  • Respond to incidents with a focus on rapid resolution and root cause analysis
  • Collaborate with development teams to improve service reliability
  • Maintain system uptime and optimize availability across services
  • Create and manage configurations for cloud environments
  • Support deployment pipelines and continuous integration workflows
  • Enforce security standards within infrastructure and deployment processes
  • Document system architecture and operational procedures

Nice to Have

  • Master's degree in a technical discipline
  • Experience supporting large-scale production systems
  • Background in site reliability or platform engineering
  • In-depth knowledge of CI/CD pipelines
  • Familiarity with service mesh technologies
  • Exposure to security compliance frameworks
  • Contributions to open-source projects
  • Experience with multi-region cloud deployments
  • Strong written and verbal communication skills
  • Ability to mentor junior engineers

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexibility based on location

Team

Part of the global cloud infrastructure team focused on scalable systems

Why Join Us

  • Work on cutting-edge cloud infrastructure supporting AI and high-performance computing
  • Collaborate with world-class engineers solving complex scalability challenges
  • Opportunity to influence architecture and operational practices

What We Offer

  • Comprehensive health and wellness benefits
  • Professional development and training programs
  • Employee resource groups and inclusive culture

Visa sponsorship available for qualified candidates

Required Skills
PythonGoPerlRubyLinuxKubernetesDockerCloud InfrastructureAutomationDistributed SystemsMonitoringCI/CD
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category infrastructure
Posted 10 months ago