Senior Site Reliability Engineer - B2B at CookUnity (Expired)

As a Senior Site Reliability Engineer, you will play a key role in shaping the foundation of our cloud-native systems. You'll design and maintain highly available infrastructure that supports continuous delivery, real-time data processing, and secure, compliant operations at scale.

What You'll Do

Architect and manage Kubernetes-based environments on AWS, including EKS clusters, ensuring high availability and efficient scaling.
Develop and enforce Infrastructure as Code (IaC) practices using Terraform to automate provisioning and maintain consistency across environments.
Build and optimize CI/CD pipelines in GitLab to support reliable, repeatable deployments across services.
Oversee database infrastructure, including RDS-hosted Postgres and MySQL, managing migrations, replication, and performance.
Design and maintain event-driven architectures using Kafka and Kinesis for asynchronous service communication and real-time data flow.
Implement comprehensive monitoring and observability solutions to detect, diagnose, and resolve issues proactively.
Collaborate with engineering and compliance teams to meet regulatory requirements including HIPAA, PCI, SOC 2, ISO 27001, and HITRUST.
Use security tools like Wiz to identify and remediate infrastructure risks, ensuring strong posture across cloud assets.

What We're Looking For

Proven experience with AWS cloud services and advanced networking across multiple regions.
Strong background in containerization with Docker and orchestration via Kubernetes and EKS.
Hands-on expertise with Terraform for managing infrastructure at scale.
Experience managing relational databases and leading decentralized migration efforts.
Familiarity with compliance frameworks and audit processes, particularly in regulated environments.
Working knowledge of security best practices and tools, including cloud-native security platforms.
Proficiency with AI-powered engineering tools to enhance development and operations workflows.

Preferred Skills

Programming experience in Python, Node.js, Golang, or Bash.
Strong troubleshooting abilities in production environments.
Experience with GitOps workflows and tools such as ArgoCD.
Knowledge of managed Kafka services like MSK in AWS.
Background in IoT, edge computing, or hardware integration, especially in operational environments.