Design and manage scalable cloud infrastructure on GCP, using Kubernetes and containerization to support demanding machine learning workloads.
Develop automated workflows for training, evaluating, and releasing ML models using platforms such as Jenkins, GitHub Actions, or Airflow.
Set up observability systems to detect model drift, performance drops, accuracy changes, and latency issues in live environments.
Act as a technical liaison between data, machine learning, backend, and frontend teams to enable seamless deployment and operations.
Establish monitoring solutions that track both system-level metrics like uptime and latency, and ML-specific indicators including feature drift and data distribution changes.
Enable team-level autonomy by deploying monitoring tools that allow individual groups to oversee their own services.
Take part in on-call duties and help maintain compliance with security standards such as SOC.

Tackle meaningful customer challenges with direct and visible outcomes.
Operate within a lean and agile environment where individual initiative is recognized and valued.
Witness the tangible results of your work on a daily basis.
Grow your expertise by engaging with emerging technologies and markets in a dynamic, high-growth setting.
Collaborate with skilled professionals in a culture that prioritizes people.
Be part of a welcoming and respectful workplace that prohibits discrimination and harassment.

Point Wild is hiring a Principal Platform Engineer