Responsibilities
- Lead end-to-end deployment and optimization projects for Solana infrastructure, including validator nodes, RPC endpoints, and indexing services.
- Drive design reviews, canary rollouts, and continuous improvements to performance and reliability.
- Own SEV 0/1 response, coordinating mitigation across Teams, running postmortems, and ensuring root-cause resolution with follow-through on corrective actions.
- Define and manage service-level objectives (SLOs) and SLAs.
- Build and maintain cost models and capacity planning tools to forecast infrastructure needs and control spend.
- Develop dashboards and alerting solutions using tools like Grafana and DataDog.
- Identify anomalies and trends to prevent outages before they occur.
- Implement and maintain automation via Ansible, Terraform, and Kubernetes.
- Reduce toil, accelerate deployment timelines, and ensure consistent environments across staging and production.
- Provide mentorship to engineers on deployment, observability, and Solana-specific ops.
- Review infrastructure code and monitoring configs.
- Raise the bar through shared knowledge.
- Act as a technical representative in Solana forums and community calls.
- Collaborate directly with the Solana Foundation and ecosystem contributors to troubleshoot and evolve protocol-level operations.
- Partner with internal infrastructure, platform, and support Teams to solve customer-impacting issues.
- Contribute insights to architectural and product-level discussions.
- Participate in an on-call rotation, ensuring 24/7 availability for critical systems and supporting rapid incident resolution.
Requirements
- Minimum of 5+ years in Technical Operations, Site Reliability Engineering (SRE), or related roles, with proven Linux/Unix system administration and advanced troubleshooting capabilities.
- Hands-on experience operating and optimizing Solana validator nodes, RPC endpoints, and associated infrastructure at scale.
- Must be familiar with high-level Solana protocol and core components.
- Proficient in analyzing validator logs, RPC debugging, and addressing Solana-specific operational issues.
- Solid hands-on experience with configuration management and infrastructure automation tools (Helm, Terraform, Ansible, Consul), including containerization expertise (Docker, Kubernetes), managing and scaling services in cloud environments.
- Competency in scripting/programming languages (Rust, Go, JavaScript).
- Advanced proficiency in monitoring and analytics platforms (Grafana, DataDog), enabling proactive and data-driven operational decision-making.
- Demonstrated ability to identify performance patterns, forecast potential issues, and implement preventive solutions.
- Strong track record defining, measuring, and maintaining SLAs/SLOs, and experienced with incident response tooling and processes (PagerDuty), ensuring quick resolution and systematic root-cause analyses.
- Exceptional interpersonal and communication skills, with a proven ability to collaborate effectively across multiple teams and stakeholders.
- Self-motivated, solution-oriented, and consistently striving for operational improvements, quality enhancements, and reduced technical debt.
- Solid professional attributes, committed to transparency, accountability, and ethical behavior.
- Capable of managing complexity and staying adaptable under pressure, and able to demonstrate continuous learning and comfort evolving within a rapidly changing technical landscape.
- Self-starter driven by curiosity and initiative, proactively identifying opportunities, addressing gaps, and implementing solutions autonomously.
- Thrives in dynamic environments and committed to maintaining industry leadership through close collaboration with the most innovative and talented minds in Web3.
Nice to Have
- Holding an RHCE-level Linux or similar certification would be beneficial.
- Contributions into open-source Solana projects is an asset.
Benefits
- Competitive benefit package in all locations where we operate.
- Quarterly bonus tied to company and individual goal achievement.
Additional Information
- 24/7 On-Call Participation: Participate in an on-call rotation, ensuring 24/7 availability for critical systems and supporting rapid incident resolution.
- International ranges, in local currency, will be discussed during the hiring process with applicable candidates.
- This role is eligible for a quarterly bonus tied to company and individual goal achievement.
- We consider years of experience, level of proficiency in job function, the technical competencies required and location when determining base salary ranges for positions and levels.
- The QuickNode compensation philosophy includes pillars to ensure fair and unbiased compensation for all employees.
- To design and deliver total reward offerings that are employee-centric.
- To offer a competitive benefit package in all locations where we operate.
- To prioritize attracting and retaining the best talent globally.
- To maintain a high-performing and flexible way of working.
- During the hiring process, we are committed to discussing compensation openly and honestly.
- We encourage candidates to share their salary expectations and requirements early, allowing for an individualized discussion.
- We know that our total rewards practices impact the lives and wellbeing of our employees.
- Therefore, we will never stop learning about the market, our business, your needs, and how best to achieve our goals through thoughtful and data-driven practices.
- If you have any questions or require further information about the compensation for this position, please don't hesitate to reach out to your Recruiter.


