Responsibilities
- Create and deploy statistical and machine learning models to organize, refine, and enhance extensive unstructured datasets
- Build systems to measure data variety, redundancy, and informational value
- Formulate statistical strategies to reduce risks associated with training data selection
- Work alongside model development teams to detect data limitations and improve dataset effectiveness
- Demonstrate experience collaborating across large foundational model initiatives and early-stage startups
- Lead strategic initiatives on data quality and define internal standards and methodologies
- Assess third-party datasets for potential adoption, prioritizing scalability, accuracy, and impact on model outcomes
- Support the creation of data evaluation scorecards
- Advance research and development of automated tools for data preprocessing and validation

