As a Senior Hardware Support Engineer, you will play a central role in maintaining the reliability of large-scale production hardware infrastructure. You'll lead deep-dive investigations into complex hardware and firmware issues, identifying root causes and implementing corrective actions to minimize downtime and improve system resilience.

Key Responsibilities

Lead end-to-end analysis of critical hardware failures, tracing issues from initial symptoms to resolution
Identify recurring failure patterns and drive systemic fixes to improve fleet-wide reliability
Serve as the primary technical escalation point during high-severity hardware incidents
Collaborate with hardware vendors to coordinate diagnostics, replacements, firmware updates, and long-term remediation
Work alongside internal engineering teams to validate hardware fixes and prevent future issues
Perform pre-deployment validation of server hardware and firmware across diverse platforms
Apply structured problem-solving frameworks to diagnose and document hardware-related outages
Support on-site operations during critical events with clear technical guidance and coordination
Enhance monitoring, failure tracking, and reporting systems to improve hardware observability
Contribute to strategic initiatives aimed at increasing long-term platform stability

Qualifications

Candidates must have extensive experience with server hardware in production environments, including deep knowledge of core components such as CPUs, memory, storage, power systems, and BMCs. You should have a proven ability to analyze telemetry and log data to diagnose failure modes, and experience applying formal incident management methodologies to resolve issues efficiently.

Strong communication skills are essential, as the role involves coordinating across engineering, operations, and vendor teams. You must be comfortable managing multiple concurrent investigations under pressure and delivering clear technical documentation.

Preferred Experience

Work with GPU-intensive systems, AI workloads, or high-performance computing infrastructure
Experience managing firmware lifecycles and validating large-scale rollouts
Familiarity with Linux-based environments and infrastructure automation tools
Track record of improving hardware reliability metrics across large fleets

Work Environment

This role is remote within the United States, with occasional travel required for on-site support during critical hardware events. The position operates in a fast-paced, innovation-driven culture focused on advancing AI and machine learning technologies.

Compensation & Benefits

Base salary ranges from $125,000 to $180,000 annually, with an annual performance-based bonus. Benefits include comprehensive medical, dental, and vision coverage; a 401(k) plan with company contribution; flexible paid time off; paid parental leave; and support for professional development.

Nebius is hiring a Hardware Support Engineer

Key Responsibilities

Qualifications

Preferred Experience

Work Environment

Compensation & Benefits

Similar Jobs

Firmware Engineer II

Senior System Development Engineer – AI Technologies

IoT Project Engineer (Hardware & Connectivity)

Principal Engineer, Power Engineering

Principal Applications Engineer - Remote

Distinguished Engineer - Linux and Kernel System