San Francisco, United States of America Remote (Global)

TRM Labs is hiring a Machine Learning Infrastructure Engineer

About the Role

As a Machine Learning Infrastructure Engineer, you'll develop and maintain robust, GPU-powered systems that power advanced AI models for identifying cryptocurrency fraud and financial crime. Your work will directly impact the performance, scalability, and reliability of production inference platforms operating at high throughput.

What You’ll Do

You’ll architect and manage GPU clusters in cloud environments, ensuring efficient orchestration, autoscaling, and workload scheduling across multiple models and users. You'll optimize inference pipelines for maximum token throughput, batching efficiency, and GPU utilization, balancing latency and cost across interactive and batch scenarios.

You'll implement distributed serving strategies such as model and tensor parallelism, and integrate performance acceleration tools like TensorRT, ONNX Runtime, vLLM, and FlashAttention. Your systems will support heterogeneous accelerators, including NVIDIA GPUs and Inferentia, with strong resource isolation and predictable performance under variable load.

You’ll build comprehensive observability into the infrastructure, tracking metrics like GPU occupancy, memory use, queue depth, and throughput to guide performance improvements. You’ll also collaborate closely with ML, infrastructure, and product teams to ensure seamless transitions from research to production.

What We’re Looking For

Bachelor’s degree in Computer Science or related field, or equivalent experience
5+ years building and operating distributed systems or infrastructure in production
Proven experience deploying ML or LLM inference workloads on GPU clusters in AWS or GCP
Deep knowledge of inference optimization, batching, and throughput tuning
Hands-on experience with Triton, vLLM, Ray Serve, ONNX Runtime, or similar frameworks
Proficiency with Kubernetes and cloud orchestration
Understanding of distributed inference patterns and GPU performance bottlenecks
Strong communication skills and ability to work across technical domains
Self-directed, adaptable, and committed to ownership and results

Preferred Experience

Familiarity with non-NVIDIA accelerators such as Inferentia
CUDA experience and debugging GPU-level issues

Environment & Culture

This role operates in a fast-moving, mission-driven setting where adaptability and problem-solving are essential. You’ll work in a distributed-first team with hubs across global cities, collaborating frequently and iteratively. AI fluency is expected, and you’ll be encouraged to use AI tools to enhance productivity and innovation. The culture values ownership, continuous learning, clear communication, and collective impact in building a more secure financial ecosystem.

Required Skills

TensorRTONNX RuntimevLLMFlashAttentionTriton Inference ServerRay ServeHuggingFace OptimumKubernetesAWSGCPML Inference OptimizationGPU Cluster ManagementDistributed SystemsHigh-Throughput InferenceCloud Infrastructure GPU clustersAWSGCPTriton Inference ServervLLMRay ServeONNX RuntimeHuggingFace OptimumKubernetesTensorRTML inference optimizationdistributed systemscloud infrastructurehigh-throughput systemsmodel serving

Looking for a remote dev community?

200+ professionals, 37 countries, one network

Working remotely doesn't mean working alone. Iglu connects you with developers, designers, and digital experts worldwide. Collaborate, learn, and grow together.

Global professional network

Knowledge sharing & collaboration

Regular community events

Cross-project opportunities

Join the community

37 countries represented

About company

TRM Labs provides a trusted blockchain intelligence platform that enables organizations to detect, monitor, and investigate crypto-related crime. The company empowers government agencies, financial institutions, and crypto businesses with tools to ensure compliance, conduct investigations, and safeguard the digital asset ecosystem.

Its platform offers extensive asset coverage across 190+ blockchains, including deep analytics for DeFi protocols and NFTs, supported by over 155 risk categories. TRM’s solutions are designed to support compliance, sanctions enforcement, fraud prevention, and real-time supervision with AI-driven capabilities like Co-Case Agent™ for accelerating investigations.

Trusted globally by leading organizations such as Goldman Sachs, Binance, and law enforcement agencies, TRM Labs combines proprietary threat intelligence with advanced data science to deliver actionable insights and help disrupt illicit financial flows in the crypto economy.

All jobs at TRM Labs Visit website

Job Details

Category infrastructure

Posted 2 months ago