Redwood City, California, United States

Sumo Logic, Inc. is hiring a Senior Machine Learning Engineer

Responsibilities

  • Create and manage scalable platforms for machine learning and large language model operations that support data versioning, training, evaluation, deployment, and monitoring.
  • Develop continuous integration and continuous delivery pipelines for machine learning models and LLM applications with automated testing, validation, and rollback features.
  • Produce infrastructure-as-code solutions to enable reproducible and version-controlled machine learning environments.
  • Design model serving systems with auto-scaling, A/B testing, and canary release capabilities.
  • Construct platforms that support large language model fine-tuning, prompt management, and large-scale experimentation.
  • Implement evaluation systems to measure LLM performance, output quality, safety, and cost efficiency.
  • Build and deploy enterprise-level AI agents and copilots with monitoring safeguards and operational controls.
  • Establish observability practices for LLMs including token tracking, latency metrics, prompt/response logging, and cost analysis.
  • Ensure high availability, reliability, and performance of machine learning and LLM services using defined service level indicators and objectives.
  • Set up comprehensive monitoring, alerting, and incident response protocols for ML systems.
  • Participate in on-call duties and lead post-incident reviews to strengthen system resilience.
  • Develop automation tools to reduce manual effort and increase the speed of ML development cycles.
  • Work closely with ML engineers and data scientists to transition research prototypes into production systems.
  • Coordinate with platform and infrastructure teams on cloud architecture design and resource efficiency.
  • Guide team members in adopting MLOps best practices, production-ready ML patterns, and operational discipline.
  • Lead technical decision-making around tools, frameworks, and architectural approaches for ML systems.

We are not able to offer nonimmigrant visa sponsorship for this position.

Required Skills
AirflowMLOps
About company
Sumo Logic, Inc.
Sumo Logic helps make the digital world secure, fast, and reliable by unifying critical security and operational data through its Intelligent Operations Platform. Built to address the increasing complexity of modern cybersecurity and cloud operations challenges, the company empowers digital teams to move from reaction to readiness—combining agentic AI-powered SIEM and log analytics into a single platform to detect, investigate, and resolve modern challenges. The platform enables organizations to protect against security threats, ensure reliability, and gain powerful insights into their digital environments.
All jobs at Sumo Logic, Inc. Visit website
Job Details
Department Software Development
Category data
Posted 4 months ago