Responsibilities
- Architect and manage scalable cloud systems on GCP, using Kubernetes and container technologies to support demanding machine learning workloads.
- Develop automated workflows for training, evaluating, and releasing ML models using platforms such as Jenkins, GitHub Actions, or Airflow.
- Integrate observability solutions to monitor model drift, performance decay, accuracy, and latency in live environments.
- Act as a technical liaison between data, machine learning, backend, and frontend teams to streamline production operations.
- Establish monitoring systems that track both infrastructure health—including latency and uptime—and ML-specific indicators like feature drift, prediction accuracy, and data distribution changes.
- Enable teams to independently monitor their services by deploying self-service observability and alerting tools.
- Take part in on-call duties and contribute to maintaining compliance with industry standards such as SOC.
Benefits
- Tackle meaningful challenges faced by real users and deliver tangible solutions.
- Witness the direct impact of your work in a lean, agile environment that values individual initiative.
- Advance your professional growth by engaging with emerging technologies, products, and markets in a rapidly evolving setting.
- Collaborate with skilled professionals in a culture that prioritizes people and teamwork.
- Shape the direction of the organization and accelerate your development through hands-on influence.
Other
- No individual seeking employment or currently employed will be subjected to discrimination or harassment due to race, color, ancestry, national origin, religion, age, gender, marital or domestic partnership status, sexual orientation, gender identity, disability, or veteran status.
- The company is dedicated to fostering an inclusive environment where everyone feels accepted and respected.