Responsibilities
- Own the full lifecycle of ML model deployment on robots—from handoff by the ML team to full system integration.
- Convert, optimize, and integrate trained models (e.g., PyTorch/ONNX/TensorRT) for Jetson platforms using NVIDIA tools.
- Develop and optimize CUDA kernels and pipelines for low-latency, high-throughput model inference.
- Profile and benchmark existing ML workloads using tools like Nsight, nvprof, and TensorRT profiler.
- Identify and remove compute and memory bottlenecks for real-time inference.
- Design and implement strategies for quantization, pruning, and other model compression techniques suited for edge inference.
- Ensure models are robust to the resource constraints of real-time, low-power robotic systems.
- Manage memory layout, concurrency, and scheduling for optimized GPU and CPU usage on Jetson devices.
- Build benchmarking pipelines for continuous performance evaluation on hardware-in-the-loop systems.
- Collaborate with QA and systems teams to validate model behavior in field scenarios.
- Work closely with ML researchers to influence model architectures for edge deployability and provide technical guidance on the feasibility of real-time ML models in the robotics stack.

