Context:
As part T376728, the following three language gap metrics were proposed for the Knowledge Gap index:
- Language representation across projects: which languages have which Wikimedia projects, and level of representation. Similar to canonical wikis.csv, but would also include test projects in the Incubator, Multilingual Wikisource, and Wikiversity Beta; additionally, would include linguistic, population, and geographic information for each language.
- Vital article coverage: Wikipedia language versions' coverage of vital articles, articles every Wikipedia should have, and/or topics for impact
- Language article coverage: Wikipedia language versions' coverage of articles about own language, related languages, and other relevant languages.
Purpose:
This task focuses on exploring and further developing #2, "Vital article coverage".
This potential dataset will provide an intersection of the language and the topic content gaps.
Analysis:
The analysis will try answer the following question to start with, and further exploration may be conducted based on the data gathered.
- How does coverage of articles every Wikipedia should have vary by Wikipedia language edition, (per article section), in terms of
- Article Quantity
- Article Quality
- Monthly Pageviews
- Monthly Revisions
Q3 tasks:
- Solicit feedback from Research team
- Exploratory analysis
- Share exploratory analyses with Community Growth and LPL teams
- Finalize schema
- Discuss and determine productionization possibilities
