Omaha, Nebraska, United States On-site

DMSi is hiring a Site Reliability Engineer

Responsibilities

  • Assess current monitoring setups and introduce enhancements for full system observability across all platforms and environments.
  • Build and manage dashboards and reporting tools that deliver real-time insights into system performance, capacity, and resource usage.
  • Maintain stable system operations by tracking key performance metrics and ensuring optimal functionality.
  • Deliver clear visibility into system conditions to support consistent and high-quality user experiences.
  • Optimize alerting configurations to reduce false alarms and ensure precise, timely notifications for urgent issues.
  • Create structured escalation paths and incident response procedures to improve resolution efficiency.
  • Examine monitoring outputs to detect patterns, irregularities, and opportunities for system improvements.
  • Generate practical recommendations for teams using data analysis, including machine learning techniques to distinguish normal from abnormal behaviors.
  • Collaborate with software developers, DevOps personnel, and other stakeholders to align monitoring practices with technical and business objectives.
  • Design and support automation scripts and utilities that simplify monitoring workflows and reduce manual intervention.
  • Maintain thorough documentation of monitoring frameworks, alerting protocols, and recommended practices.
  • Offer training and support to help teams effectively use monitoring tools and understand performance data.
  • Regularly evaluate and update monitoring strategies to keep pace with evolving technologies and organizational needs.
  • Keep current with advancements and emerging solutions in the field of system observability.

Work Arrangement

On-site

Other

  • Work takes place in a standard office setting with regular use of computers and phones; no significant physical requirements are involved.
  • Occasional travel is required, which may include commercial flights and rental vehicles for business purposes.
Required Skills
PrometheusGrafanaELK StackDatadogPythonBashPowerShellAWSMicrosoft AzureGCPSaaSMonitoringAutomation
Job Details
Department Engineering
Category infrastructure
Posted 3 months ago