Protege is hiring a Senior Data Scientist

Protege is looking for a Senior Data Scientist to be at the heart of how we curate, assess, and prepare the training data that powers real-world AI systems. You'll lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models.

What You'll Do

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets
  • Develop frameworks to assess data diversity, duplication, and informativeness
  • Design statistical approaches to de-risk training datasets
  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance
  • Provide leadership on data quality strategy and shape internal best practices
  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance
  • Help build data scorecards
  • Contribute to research and development of tools that automate data preprocessing and validation

What We're Looking For

  • PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field
  • Strong understanding of AI model training pipelines, including pre-processing and evaluation
  • Experience working with large, unstructured datasets, especially text
  • Background in statistical analysis, bias detection, and data validation
  • Able to identify high-impact problems and drive independent solutions

Nice to Have

  • Experience with synthetic data generation or augmentation strategies
  • Publications or open-source contributions in data-centric AI or related areas
  • Experience developing evaluation frameworks or performance metrics for training data
  • Cross-functional collaboration with product, infrastructure, or partnership teams

Team & Environment

You will collaborate with research and engineering teams within our lean, fast-moving, high-trust environment. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Required Skills
machine learningstatistical modelingPythonSQLdata visualizationA/B testingexperimental designcommunicationproject managementstakeholder management machine learningstatistical modelingPythonSQLdata visualizationA/B testingexperimental designcommunicationproject managementstakeholder management
Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
Protege
Protege solves the biggest unmet need in AI — getting access to the right training data. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.
All jobs at Protege Visit website
Job Details
Category data
Posted 3 months ago