NVIDIA is hiring a Senior Software Engineer - NIM Factory Infrastructure

About the Role

NVIDIA is looking for a Senior Software Engineer - NIM Factory Infrastructure to design and build factory automation for NVIDIA Inference Microservices (NIMs). You will apply deep technical expertise to create an efficient, scalable, and reliable automation pipeline that transforms AI models into validated, deployable NIMs.

What You'll Do

  • Develop, analyze, and optimize factory infrastructure that takes an AI model in and produces a deployable service validated across Cloud, On-prem, and Kubernetes environments.
  • Define and deliver rapid iterations on the group's technical strategies and roadmaps to deliver and improve the NIM factory.
  • Develop harness, automate hardware acceptance, analyze benchmarks, gather data, and perform statistical analysis of systems health and performance of NIMs.
  • Design and develop scalable and reliable factory acceptance and performance tuning of hardware platforms.
  • Collaborate with multiple AI model teams to understand requirements and build efficient infrastructure that improves team productivity.
  • Define metrics and drive improvements based on user feedback.
  • Mentor and collaborate throughout the team and with other teams.

What We're Looking For

  • History of using advanced programming skills to build tooling and automation for hardware system characterization and benchmarking.
  • Proven experience debugging and analyzing performance of compute applications and systems.
  • Deep technical expertise working with system software and platform layers including Kernel, device driver, memory, storage, networking, and PCIe devices.
  • Experience working with hardware clusters, distributed systems, networking, GPU interconnects (PCIe, NVlink), node and cluster interconnect (InfiniBand).
  • Passion for building platform engineering components and automation of system benchmarking and characterization.
  • Excellent interpersonal skills and the ability to lead multi-functional efforts.
  • BS or MS in Computer Science, Computer Engineering or related field (or equivalent experience).
  • 5+ years of proven experience developing performant microservices, cloud software and/or tooling.

Nice to Have

  • Experience delivering optimized system engineering environment for inference applications in data center and consumer grade hardware platforms.
  • History of building and deploying automated benchmarking solutions in Cloud and On-prem environments, and their associated CI/CD pipelines.
  • Prior experience in working with large scale compute infrastructure solutions.

Technical Stack

  • Docker
  • Kubernetes
  • Cloud
  • On-prem
  • GPU

Benefits & Compensation

  • Compensation: 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4. + equity: Eligible
  • Equity
  • Benefits (via NVIDIA benefits page)

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Required Skills
DockerKubernetesCloud InfrastructureOn-prem InfrastructureGPU ComputingInfrastructure as CodeCI/CDDistributed SystemsAutomationMonitoringNetworkingSecurityPerformance Optimization DockerKubernetesCloud InfrastructureOn-prem InfrastructureGPU ComputingInfrastructure as CodeCI/CDDistributed SystemsAutomationMonitoringNetworkingSecurityPerformance Optimization
Ready to relocate and code from paradise?

Thailand or Vietnam — your office, your rules

Iglu offers relocation to Bangkok, Chiang Mai, Ho Chi Minh City, or Hong Kong. Full employment, legal setup, and a community of 200+ digital professionals.

Relocation to 5 countries
Full legal work setup
Developer community access
Work-life balance culture
Explore locations
Relocation support included
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category management
Posted 4 months ago