This role is responsible for managing the complete lifecycle of machine learning systems, from initial data processing through model training, deployment, monitoring, and eventual retirement. You will transform experimental AI models into resilient, production-grade services capable of handling real-world clinical data at scale.
Key Responsibilities
- Drive the full ML lifecycle, ensuring models are reliably trained, validated, deployed, and retrained in production.
- Convert research prototypes into scalable, maintainable systems using modern MLOps practices.
- Design and manage cloud infrastructure on AWS or Azure to support training, inference, and monitoring workflows.
- Build automated CI/CD pipelines for machine learning to ensure reproducibility and consistency across deployments.
- Integrate large language models and generative AI components into clinical applications, with emphasis on processing unstructured medical text.
- Develop high-performance inference APIs and pipelines that power customer-facing features.
- Apply containerization and Infrastructure-as-Code tools like Docker, Kubernetes, and Terraform to manage dynamic environments.
- Implement monitoring and alerting systems to detect model drift, track performance, and uphold service level agreements.
- Optimize models and infrastructure for speed, reliability, and cost efficiency.
- Collaborate with cross-functional teams in an Agile setting to align technical execution with product objectives.
Requirements
- Minimum of five years in software or machine learning engineering roles.
- Degree in Computer Science, Engineering, or related field, or equivalent professional experience.
- Strong programming skills in Python or Java, with a focus on clean, maintainable code.
- Proven experience deploying and scaling ML models in production environments.
- Proficiency with cloud platforms (AWS or Azure), containerization, and infrastructure automation.
- Hands-on experience with MLOps tools such as MLflow, SageMaker, or Kubeflow.
- Understanding of CI/CD, observability, and automation in ML systems.
- Familiarity with NLP techniques including tokenization, embeddings, and sequence modeling; healthcare experience is beneficial.
- Experience fine-tuning and operating LLMs and generative AI models.
- Strong problem-solving abilities and a track record of designing reliable, scalable systems.
- Excellent communication skills and experience working in distributed, collaborative teams.
- Self-motivated with the ability to contribute effectively from the start.
Preferred Qualifications
- Background in healthcare or clinical AI applications.
- Experience with Hugging Face, PyTorch, or TensorFlow.
- Exposure to agentic AI or advanced generative AI use cases.
- AWS certification at the Associate level, particularly in Machine Learning or Solutions Architecture.


