Germany, Remote, Switzerland, Zurich, Germany, Munich, Germany, Berlin remote

NVIDIA is hiring a Senior HPC and AI Networking Performance Research and Analysis Engineer

About the Role

As a Senior HPC and AI Networking Performance Research and Analysis Engineer, you will investigate and enhance the performance of AI workloads running on extensive GPU and CPU systems. Your primary focus will be on distributed deep learning applications, particularly large language model training and inference, where communication patterns and network efficiency play a critical role.

Key Responsibilities

  • Conduct in-depth profiling and analysis of AI workloads to uncover performance bottlenecks, especially in communication and data transfer layers
  • Design and execute benchmarking strategies to evaluate system behavior under real-world conditions
  • Collaborate with hardware and software teams to assess performance across CPUs, GPUs, host channel adapters, and network switches
  • Develop and apply simulation models, performance tools, and analytical methods to diagnose system limitations
  • Investigate low-level system interactions to determine root causes of performance issues
  • Establish performance baselines and define testing strategies for emerging technologies
  • Guide optimization efforts to achieve maximum system throughput and efficiency

Qualifications

Applicants should hold a Bachelor's degree in Computer Science or Software Engineering and bring at least six years of hands-on experience in high-performance networking. Essential skills include deep familiarity with RDMA, MPI, NCCL, and networking protocols such as RoCE. Proficiency in Python, Bash, and C is required, along with strong Linux system knowledge.

Experience with NVIDIA GPUs, CUDA libraries, and deep learning frameworks like TensorFlow or PyTorch is necessary. Demonstrated ability in performance analysis, problem solving, and cross-team collaboration is essential.

Preferred Background

  • Proven track record in benchmarking AI workloads, especially for distributed LLM training
  • Strong understanding of CUDA and NCCL internals
  • Comprehensive knowledge of system architecture, including CPUs (Intel, AMD, ARM), GPUs, memory, and PCI subsystems
  • Familiarity with congestion control mechanisms in high-speed networks
Required Skills
RDMAMPINCCLRoCECUDATensorFlowPyTorchPythonBashCPerformance AnalysisDistributed Deep LearningCollective CommunicationHPCNetworking RDMAMPINCCLRoCECUDATensorFlowPyTorchPythonBashCPerformance AnalysisDistributed Deep LearningCollective CommunicationHPCNetworking
Looking for a remote dev community?

200+ professionals, 37 countries, one network

Working remotely doesn't mean working alone. Iglu connects you with developers, designers, and digital experts worldwide. Collaborate, learn, and grow together.

Global professional network
Knowledge sharing & collaboration
Regular community events
Cross-project opportunities
Join the community
37 countries represented
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Department Performance group
Category data
Posted 2 months ago