About the Role
This role involves developing and optimizing system software to support partner integration and deployment of high-performance computing solutions, with a focus on improving performance, scalability, and collaboration across distributed systems.
Responsibilities
- Design and optimize low-level software components for distributed computing environments
- Collaborate with partner engineering teams to integrate communication libraries
- Improve performance and scalability of system-level software in GPU-accelerated clusters
- Diagnose and resolve complex software issues impacting partner deployments
- Develop tools and frameworks to streamline integration workflows
- Support debugging and tuning of communication primitives across hardware platforms
- Contribute to the evolution of collective communication algorithms
- Work closely with hardware and driver teams to ensure compatibility
- Produce technical documentation for internal and external stakeholders
- Assist partners in adopting optimized communication libraries
- Analyze system bottlenecks and propose architectural improvements
- Ensure software reliability under high-load conditions
- Participate in code reviews and maintain code quality standards
- Implement testing strategies for cross-platform validation
- Stay current with advancements in parallel computing and networking
- Optimize software for diverse data center configurations
- Support performance benchmarking and profiling activities
- Integrate feedback from partners into product enhancements
- Contribute to open-source projects related to communication layers
- Facilitate knowledge transfer between internal and external teams
- Ensure compliance with software interface standards
- Develop proof-of-concept implementations for new features
- Collaborate on defining roadmap priorities for system software
- Troubleshoot interoperability issues across software stacks
- Promote best practices in system-level software development
Compensation
Competitive salary and benefits package commensurate with experience
Work Arrangement
Hybrid work model with flexibility based on role and location
Team
Part of a global engineering team focused on system software and partner collaboration
About the Team
This team focuses on developing core communication libraries that power large-scale AI and high-performance computing systems, enabling seamless integration across diverse hardware and software environments.
Why This Role Matters
The work directly impacts the efficiency and scalability of distributed computing solutions used by leading research and enterprise organizations worldwide.
Limited sponsorship may be available for qualified candidates


