Responsibilities
- Architect, implement, and manage scalable cloud environments on GCP with container orchestration using Kubernetes to support demanding machine learning workloads.
- Develop automated workflows for model training, evaluation, and deployment leveraging tools such as Jenkins, GitHub Actions, or Airflow.
- Integrate observability solutions to monitor model behavior, including accuracy decay, latency changes, and performance issues in live environments.
- Facilitate collaboration between data, machine learning, backend, and frontend teams to ensure seamless operational workflows.
- Establish monitoring systems for both infrastructure health (e.g., latency, uptime) and ML-specific indicators like feature drift and data distribution changes.
- Enable team autonomy by deploying monitoring tools that allow individual groups to track their own system performance.
- Engage in on-call duties and contribute to maintaining compliance postures aligned with standards such as SOC.
Team
High-trust, outcome-focused team
Other
- All individuals, whether employees or applicants, are protected from discrimination and harassment based on race, color, ancestry, national origin, religion, age, gender, marital or domestic partner status, sexual orientation, gender identity, disability, or veteran status.
- The company is dedicated to fostering an inclusive environment where everyone feels welcomed and valued.