San Francisco, United States of America Remote (Global)

TRM Labs is hiring a Machine Learning Infrastructure Engineer

About the Role

As a Machine Learning Infrastructure Engineer, you'll develop and maintain robust, GPU-powered systems that power advanced AI models for identifying cryptocurrency fraud and financial crime. Your work will directly impact the performance, scalability, and reliability of production inference platforms operating at high throughput.

What You’ll Do

You’ll architect and manage GPU clusters in cloud environments, ensuring efficient orchestration, autoscaling, and workload scheduling across multiple models and users. You'll optimize inference pipelines for maximum token throughput, batching efficiency, and GPU utilization, balancing latency and cost across interactive and batch scenarios.

You'll implement distributed serving strategies such as model and tensor parallelism, and integrate performance acceleration tools like TensorRT, ONNX Runtime, vLLM, and FlashAttention. Your systems will support heterogeneous accelerators, including NVIDIA GPUs and Inferentia, with strong resource isolation and predictable performance under variable load.

You’ll build comprehensive observability into the infrastructure, tracking metrics like GPU occupancy, memory use, queue depth, and throughput to guide performance improvements. You’ll also collaborate closely with ML, infrastructure, and product teams to ensure seamless transitions from research to production.

What We’re Looking For

  • Bachelor’s degree in Computer Science or related field, or equivalent experience
  • 5+ years building and operating distributed systems or infrastructure in production
  • Proven experience deploying ML or LLM inference workloads on GPU clusters in AWS or GCP
  • Deep knowledge of inference optimization, batching, and throughput tuning
  • Hands-on experience with Triton, vLLM, Ray Serve, ONNX Runtime, or similar frameworks
  • Proficiency with Kubernetes and cloud orchestration
  • Understanding of distributed inference patterns and GPU performance bottlenecks
  • Strong communication skills and ability to work across technical domains
  • Self-directed, adaptable, and committed to ownership and results

Preferred Experience

  • Familiarity with non-NVIDIA accelerators such as Inferentia
  • CUDA experience and debugging GPU-level issues

Environment & Culture

This role operates in a fast-moving, mission-driven setting where adaptability and problem-solving are essential. You’ll work in a distributed-first team with hubs across global cities, collaborating frequently and iteratively. AI fluency is expected, and you’ll be encouraged to use AI tools to enhance productivity and innovation. The culture values ownership, continuous learning, clear communication, and collective impact in building a more secure financial ecosystem.

Required Skills
TensorRTONNX RuntimevLLMFlashAttentionTriton Inference ServerRay ServeHuggingFace OptimumKubernetesAWSGCPML Inference OptimizationGPU Cluster ManagementDistributed SystemsHigh-Throughput InferenceCloud Infrastructure GPU clustersAWSGCPTriton Inference ServervLLMRay ServeONNX RuntimeHuggingFace OptimumKubernetesTensorRTML inference optimizationdistributed systemscloud infrastructurehigh-throughput systemsmodel serving
Looking for a remote dev community?

200+ professionals, 37 countries, one network

Working remotely doesn't mean working alone. Iglu connects you with developers, designers, and digital experts worldwide. Collaborate, learn, and grow together.

Global professional network
Knowledge sharing & collaboration
Regular community events
Cross-project opportunities
Join the community
37 countries represented
About company
TRM Labs

TRM Labs provides a trusted blockchain intelligence platform that enables organizations to detect, monitor, and investigate crypto-related crime. The company empowers government agencies, financial institutions, and crypto businesses with tools to ensure compliance, conduct investigations, and safeguard the digital asset ecosystem.

Its platform offers extensive asset coverage across 190+ blockchains, including deep analytics for DeFi protocols and NFTs, supported by over 155 risk categories. TRM’s solutions are designed to support compliance, sanctions enforcement, fraud prevention, and real-time supervision with AI-driven capabilities like Co-Case Agent™ for accelerating investigations.

Trusted globally by leading organizations such as Goldman Sachs, Binance, and law enforcement agencies, TRM Labs combines proprietary threat intelligence with advanced data science to deliver actionable insights and help disrupt illicit financial flows in the crypto economy.

All jobs at TRM Labs Visit website
Job Details
Category infrastructure
Posted 2 months ago