United States of America Remote (Global)

Lavendo is hiring a Senior AI/ML Specialist Solutions Architect (AI Infra & Cloud)

About the Role

As a Senior AI/ML Specialist Solutions Architect, you will lead the design and implementation of high-performance AI systems tailored to the needs of AI-driven organizations. Your work will focus on optimizing large-scale distributed training and inference workflows across multi-GPU and multi-node environments, ensuring solutions are production-ready, efficient, and aligned with long-term business objectives.

Key Responsibilities

Design and refine scalable AI architectures that leverage modern cloud and HPC infrastructure
Guide the evolution of machine learning pipelines from proof-of-concept to robust, production-grade systems
Collaborate with engineering and product teams to integrate customer feedback and shape future technology roadmaps
Deliver technical insights through whitepapers, presentations, and webinars to support customer success
Advise internal teams and clients on best practices in AI deployment, performance tuning, and infrastructure strategy
Foster trusted relationships with technical and business stakeholders to ensure alignment with strategic goals

Qualifications

Minimum of 5 years in cloud infrastructure, MLOps, or solutions architecture roles with a focus on AI/ML systems
Proven experience scaling AI models in production using frameworks such as PyTorch and JAX
Strong understanding of NVIDIA’s HPC ecosystem, including CUDA, NCCL, and Infiniband networking
Excellent communication skills with the ability to translate complex technical concepts for diverse audiences
Eligibility to work full-time in the U.S. without sponsorship

Preferred Technical Experience

Programming: Python, Go, Java, or C++
Infrastructure as Code: Terraform, Ansible
Orchestration platforms: Kubernetes, Slurm
DevOps tools: Git, Docker, Helm
Big data technologies: Spark, Kafka, Hadoop
Database systems: SQL, NoSQL, and vector databases
ML frameworks: TensorFlow, HuggingFace, Scikit-learn

Compensation & Benefits

Annual salary ranges from $225,000 to $315,000, negotiable based on experience and location. The role includes stock options and a 4% 401(k) match. Employees receive 100% company-paid medical, dental, and vision coverage for themselves and their families, along with company-paid disability and life insurance.

Additional benefits include 20 weeks of paid parental leave for primary caregivers, 12 weeks for secondary caregivers, and a monthly stipend for internet and mobile expenses. The position supports full remote flexibility for U.S.-based team members.

Technology & Impact

You’ll work with state-of-the-art AI infrastructure, including the latest NVIDIA GPUs and one of the most powerful commercially available supercomputers. The organization is committed to sustainable AI, operating energy-efficient data centers that repurpose waste heat to support local communities.

Company Values

The organization champions equitable access to AI infrastructure, empowers teams to build and scale AI solutions, and simplifies the challenges of AI development. It is dedicated to fostering an inclusive, diverse, and accessible workplace for all employees.

Required Skills

PyTorchJAXCUDANCCLInfinibandPythonGoJavaC++TerraformMLOpsAI/MLCloud InfrastructureNVIDIA HPCMulti-GPU Optimization PyTorchJAXTensorFlowHuggingFaceScikit-learnCUDANCCLInfinibandNVIDIA HPC ecosystemPythonMLOpsCloud InfrastructureMulti-GPU OptimizationAI Workload ScalingDistributed Systems

Want to work from Thailand?

Join a remote network built for tech talent

Iglu gives you real employment in Southeast Asia — visa, work permit, and projects included. Pick what you work on, earn performance-based pay, and live where you want.

Legal employment in Thailand & Vietnam

Choose your own projects

Performance-based revenue sharing

Relocation support available

Join Iglu

200+ professionals worldwide

About company

Building AI-centric cloud infrastructure that combines large GPU clusters, high-speed networks, and cloud-native tooling into a platform used by enterprises, startups, and research teams. The goal is to enable serious AI and simulation workloads without requiring customers to build their own supercomputers.

All jobs at Lavendo Visit website

Job Details

Category infrastructure

Posted 2 months ago