Page MenuHomePhabricator

Develop Metrics for the Language Gap: Develop metrics for language coverage on Wiki Commons (captions)
Closed, ResolvedPublic

Description

A proposed metric facet for the State of Languages Metrics is Wiki Commons coverage, to include the following proposed metrics:

Overall coverage:

  • Number of languages with captions on Commons (this ticket)
  • Number of languages with descriptions on Commons (see T374279)

Per language:

  • Status of Commons captions in the language (e.g., present, absent) (this ticket)
  • Status of Commons descriptions in the language (e.g., present, absent) (see T374279)
  • Number of captions in the language (this ticket)
  • Number of descriptions in the language (see T374279)

Tasks:

  • Build notebook to aggregate caption language counts from Commons data dump
    • wrangle
    • standardize language codes for joining with wiki project languages
  • Build a notebook(s) for metrics calculation and visualization

Event Timeline

CMyrick-WMF changed the task status from Open to In Progress.Aug 16 2024, 9:09 PM
CMyrick-WMF added a subscriber: Isaac.

Weekly update:

Weekly update:

  • Issue discovered: Many file captions which are labeled English are written in non-English language (e.g., e.g.)
    • This is causing English to be very overrepresented in the caption counts
    • Long term solution: I plan to take a sample of English-labeled captions, and see what % are non-English in order to generate some sort of threshold for uncertainty.
    • Short term solution: For now, I will likely visualize and public captions data for non-English language counts only.

Weekly update:

I will break this into two tickets: (1) file captions, and (2) file descriptions. I will mark the first ticket as complete.

Edit: new ticket located at T374279

CMyrick-WMF renamed this task from Develop Metrics for the Language Gap: Develop metrics for language coverage on Wiki Commons to Develop Metrics for the Language Gap: Develop metrics for language coverage on Wiki Commons (captions).Sep 6 2024, 8:25 PM
CMyrick-WMF closed this task as Resolved.
CMyrick-WMF updated the task description. (Show Details)
CMyrick-WMF updated the task description. (Show Details)