Responsibilities
- Be responsible for running test cases to validate NVIDIA GPU Communications Libraries (NCCL, NVSHMEM, UCX, GDRCopy, GPUDirect RDMA etc).
- Be responsible to automate test cases and maintain the automation scripts.
- Collaborate with Developer, PM, marketing, and engineering teams on crafting test plan and implementing validation.
- You will assist in the architecture, crafting and implementing of SWQA test frameworks.
- Be responsible for code coverage improvement and code complexity optimization.
Requirements
- BS or higher degree in CS/EE/CE or equivalent experience
- 5+ years of relevant experience
- Seasoned software QA or software testing background; test infrastructure and strong analysis skills
- Be proficient in scripting language (Python, Perl, bash)
- Solid experience with AI development tools for test development and automation
- Knowledge of basic networking concepts
- UNIX/Linux experience is required
- Experiences in C/C++ is required
- Ability to work independently and leadership skills
- Experience in using quality mindset to drive improvements
- Proficient oral and written English
Nice to Have
- Experience with CUDA programming and NVIDIA GPUs
- Knowledge of high-performance networks like InfiniBand, RoCE, etc
- Experience with CSPs (AWS, Google Cloud, Oracle Cloud Infrastructure, Microsoft Azure), and HPC cluster, slurm, ansible, etc
- Prior experience with virtualization technologies (KVM, HyperV, VMWARE, OpenStack, Docker, Kubernetes)
- Experience with Deep Learning Frameworks such as PyTorch, TensorFlow, etc
Additional Information
- Proficient oral and written English
- Ability to work independently
- Leadership skills
- Experience in using quality mindset to drive improvements


