As a Machine Learning Infrastructure Engineer, you'll develop and maintain robust, GPU-powered systems that power advanced AI models for identifying cryptocurrency fraud and financial crime. Your work will directly impact the performance, scalability, and reliability of production inference platforms operating at high throughput.
What You’ll Do
You’ll architect and manage GPU clusters in cloud environments, ensuring efficient orchestration, autoscaling, and workload scheduling across multiple models and users. You'll optimize inference pipelines for maximum token throughput, batching efficiency, and GPU utilization, balancing latency and cost across interactive and batch scenarios.
You'll implement distributed serving strategies such as model and tensor parallelism, and integrate performance acceleration tools like TensorRT, ONNX Runtime, vLLM, and FlashAttention. Your systems will support heterogeneous accelerators, including NVIDIA GPUs and Inferentia, with strong resource isolation and predictable performance under variable load.
You’ll build comprehensive observability into the infrastructure, tracking metrics like GPU occupancy, memory use, queue depth, and throughput to guide performance improvements. You’ll also collaborate closely with ML, infrastructure, and product teams to ensure seamless transitions from research to production.
What We’re Looking For
- Bachelor’s degree in Computer Science or related field, or equivalent experience
- 5+ years building and operating distributed systems or infrastructure in production
- Proven experience deploying ML or LLM inference workloads on GPU clusters in AWS or GCP
- Deep knowledge of inference optimization, batching, and throughput tuning
- Hands-on experience with Triton, vLLM, Ray Serve, ONNX Runtime, or similar frameworks
- Proficiency with Kubernetes and cloud orchestration
- Understanding of distributed inference patterns and GPU performance bottlenecks
- Strong communication skills and ability to work across technical domains
- Self-directed, adaptable, and committed to ownership and results
Preferred Experience
- Familiarity with non-NVIDIA accelerators such as Inferentia
- CUDA experience and debugging GPU-level issues
Environment & Culture
This role operates in a fast-moving, mission-driven setting where adaptability and problem-solving are essential. You’ll work in a distributed-first team with hubs across global cities, collaborating frequently and iteratively. AI fluency is expected, and you’ll be encouraged to use AI tools to enhance productivity and innovation. The culture values ownership, continuous learning, clear communication, and collective impact in building a more secure financial ecosystem.


