As a Senior Infrastructure Engineer focused on observability, you will take ownership of critical systems that monitor and enhance the health of production environments. Operating across multiple data centers and Kubernetes clusters, you'll design and refine the infrastructure behind logs, metrics, and distributed traces to ensure real-time visibility and reliability.
What You’ll Do
- Build and scale observability pipelines that support a growing, distributed architecture
- Lead the evolution of logging infrastructure, from ingestion to storage and querying
- Enhance tracing coverage across services and promote consistent adoption in engineering teams
- Manage and automate components like EKS control planes, agents, and data collectors
- Reduce manual effort through smart automation of monitoring and alerting workflows
- Ensure systems meet compliance standards for access, integrity, and auditability
- Evaluate emerging tools and frameworks, making informed decisions on adoption
What We’re Looking For
- At least 8 years of hands-on experience with production-grade observability platforms
- Proven expertise in designing and maintaining logging pipelines at scale
- Strong understanding of distributed tracing and service performance monitoring
- Experience managing Kubernetes clusters, particularly EKS, including lifecycle operations
- Knowledge of storage solutions tailored for high-volume observability data
- Ability to balance deep technical work with strategic planning and cross-team collaboration
- Fluent English communication for technical and architectural discussions
Nice to Have
- Hands-on experience with OpenTelemetry, Kafka, Vector, or VictoriaMetrics
- Exposure to AI-driven automation in infrastructure, such as root cause analysis or deployment workflows
- Familiarity with MCP implementations or migration projects involving cloud environments
- Background in audit preparation, including access controls and data pipeline integrity
- Networking and incident response experience in production systems, including post-mortem analysis
- Proficiency in Python, Golang, or Java for tooling and scripting
Environment and Culture
The team values autonomy, innovation, and collaboration. You’ll work in a flexible, trust-based environment where ownership is encouraged and productivity is balanced with personal well-being. Support is provided for relocation, enabling global talent to join the effort.


