Page MenuHomePhabricator

[Research Engineering Request] Add vital-article language gap to Content Gap Metrics
Open, Stalled, Needs TriagePublic

Description

Goal

Incorporate vital-article gap into Knowledge Gap pipelines.

Details

Caroline put together code under T383925: Develop Metrics for the Language Gap: Explore vital article coverage across Wikipedia language editions to calculate coverage of the List of articles every Wikipedia should have for all language editions of Wikipedia. It should largely correspond to the schema used by the other language gap metrics. The notebook currently uses a SPARQL query to gather the data but that could easily be converted into a query against the item_page_link table if that's deemed a more sustainable approach.

Motivation

The existing content gap metrics already cover several facets but we have not yet provided coverage of the Language Gap. This vital articles gap arose from exploration of what metrics might look in that space. Ultimately it was determined that the Language Gap was multi-faceted and closely related to the Topics for Impact gap as well. The vital articles component captures a globally-important set of articles that Wikimedians have determined should exist on all Wikipedia language projects. Measuring progress against this list helps in assessing one aspect of these gaps. Other aspects of these gaps are also under consideration -- e.g., language coverage, more locally-impactful topics -- but this vital-article component is well-defined and worth formalizing while we wait for a clearer scope with respect to the broader language and topics for impact space.

Event Timeline

Miriam changed the task status from Open to Stalled.Jul 31 2025, 9:15 AM
Miriam added subscribers: MaryMunyoki, Miriam.

One note on this task. We agreed with @CMyrick-WMF and @MaryMunyoki to wait for a new operationalizable definition of "vital articles" to be finalized as part of WE 2.1 before moving forward with this task.