Page MenuHomePhabricator

Develop Metrics for the Language Gap: explore uncertainty threshold for Wiki Commons file captions labeled as 'English'
Open, Needs TriagePublic

Description

Issue discovered while working on T372641:

  • Many Wiki Commons file captions which are labeled English are written in non-English languages
  • --> This is causing English to be very overrepresented in the caption counts

Tasks:

  • Take a sample of English-labeled captions
  • Determine what % are non-English in order to generate some sort of threshold for uncertainty.

Event Timeline

CMyrick-WMF renamed this task from Develop Metrics for the Language Gap: explore uncertainly threshold for Wiki Commons file captions labeled as 'English' to Develop Metrics for the Language Gap: explore uncertainty threshold for Wiki Commons file captions labeled as 'English'.Mar 27 2025, 2:43 PM