United States Remote (Country) USD 221,000 - 260,000 Yearly

Sumo Logic, Inc. is hiring a Staff Machine Learning Engineer

Sumo Logic is looking for a Staff Machine Learning Engineer to lead the design and delivery of the next generation of Agentic AI systems for our Security Operations Center. You will evaluate, prototype, and productionize state-of-the-art agentic AI technologies, building scalable multi-agent architectures that reason over large-scale machine data to drive real-time security detection, investigation, and response.

What You'll Do

  • Lead and partner on technical evaluation and adoption of cutting-edge agentic AI platforms, including Anthropic (Claude), LangChain/LangGraph, AWS Bedrock, and other emerging frameworks.
  • Architect, prototype, and productionize multi-agent AI systems for Agentic SOC use cases like detection, triage, investigation, and response workflows.
  • Own the design of core agent architecture components, including planning, execution, tool orchestration, memory, context engineering, and long-running agent workflows.
  • Lead AI agent evaluation systems, including offline and online evaluation pipelines, golden datasets, synthetic data generation, human- and LLM-based judging, and continuous quality monitoring.
  • Drive LLM fine-tuning and alignment efforts to improve domain-specific reasoning, accuracy, and reliability for security and observability use cases.
  • Design scalable LLMOps and AI agent infrastructure, including inference routing, latency optimization, cost control, and production observability.
  • Partner with product, security, and data platform leadership to deliver end-to-end AI agent capabilities from prototype to customer-facing production systems.
  • Lead and partner on technical direction and mentorship for AI engineers working on agentic AI and LLM systems.
  • Define and implement best practices for AI safety, reliability, evaluation, and monitoring in production agentic systems.
  • Operate as a senior technical owner in ambiguous problem spaces, setting technical direction, breaking down complex problems, and driving delivery across teams.

What We're Looking For

  • B.Tech, M.Tech, or Ph.D. in Computer Science, Machine Learning, Data Science, or a related technical field.
  • 5+ years of hands-on industry experience building, operating, and leading production ML/AI systems, with demonstrated technical leadership.
  • Strong foundation in machine learning, distributed systems, data pipelines, and large-scale system design.
  • Deep understanding of LLMs, prompt engineering, context engineering, agentic AI design patterns, and reasoning workflows.
  • Strong proficiency in Python and modern ML/AI ecosystems.
  • Experience designing and operating evaluation frameworks for ML/LLM systems (offline and online).
  • Proven ability to lead complex technical initiatives across teams and influence architecture decisions.
  • Excellent communication skills and ability to translate complex AI systems into business impact.

Nice to Have

  • Hands-on experience building and scaling agentic AI systems or multi-agent architectures in production.
  • Experience with modern agent frameworks such as LangGraph, LangChain, CrewAI, or similar.
  • Experience with major foundation model platforms such as Anthropic, OpenAI, AWS Bedrock, or Vertex AI.
  • Experience with LLM fine-tuning pipelines (SFT, RLHF/RLAIF, preference learning, domain adaptation).
  • Strong background in LLMOps, including inference optimization, latency/cost management, observability, and production monitoring.
  • Experience with ML infrastructure and tooling such as PyTorch, MLflow, Airflow, Docker, Kubernetes, and cloud platforms (AWS/GCP/Azure).
  • Experience applying AI/ML to security, observability, or large-scale log/telemetry data.

Technical Stack

  • Python, Anthropic (Claude), LangChain/LangGraph, AWS Bedrock
  • PyTorch, MLflow, Airflow, Docker, Kubernetes
  • AWS, GCP, Azure

Benefits & Compensation

  • Compensation range: $221,000 - $260,000

Work Mode

This is a local-country position in the USA.

Required Skills
PythonAnthropic (Claude)LangChain/LangGraphAWS BedrockPyTorchMLflowAirflowDockerKubernetesAWSMachine LearningLLMsPrompt EngineeringDistributed Systems PythonAnthropic (Claude)LangChain/LangGraphAWS BedrockPyTorchMLflowAirflowDockerKubernetesAWSMachine LearningLLMsPrompt EngineeringDistributed Systems
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Sumo Logic, Inc.
Sumo Logic helps make the digital world secure, fast, and reliable by unifying critical security and operational data through its Intelligent Operations Platform. Built to address the increasing complexity of modern cybersecurity and cloud operations challenges, the company empowers digital teams to move from reaction to readiness—combining agentic AI-powered SIEM and log analytics into a single platform to detect, investigate, and resolve modern challenges. The platform enables organizations to protect against security threats, ensure reliability, and gain powerful insights into their digital environments.
All jobs at Sumo Logic, Inc. Visit website
Job Details
Department Software Development
Category data
Posted 3 months ago