Q2 Goal: Based on current language metrics compiled thus far, make recommendations for metric(s) to include in the Knowledge Gaps Index metrics (i.e., per wiki; distribution across buckets)
Context:
Q2 Goal: Based on current language metrics compiled thus far, make recommendations for metric(s) to include in the Knowledge Gaps Index metrics (i.e., per wiki; distribution across buckets)
Context:
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | CMyrick-WMF | T348246 Develop Metrics for the Language Gap | |||
| Resolved | CMyrick-WMF | T376728 Develop Metrics for the Language Gap: Propose language metric(s) to be included in knowledge gap index |
Background
From A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft), within the READERS section:
3.1.4 Language. The language gap reflects the different levels of readership depending on readers’ ability to read one or more languages. What languages an individual can read greatly impacts what content is available to them and can introduce greater barriers if they are forced to read content in a language that is less familiar to them. Surveys have been conducted to estimate readers’ literacy [141, 85, 52, 21] suggesting that certain languages have highly-literate readers. For example, languages that are specific to one country show high levels of literacy amongst readers. In contrast, other languages such as English or French, which are more strongly associated with colonialism, have many readers for which English / French is not their native language [141]. In order to address this issue, in English Simple Wikipedia was introduced using a simpler grammar and a limited vocabulary. While improving readability in comparison to English Wikipedia, research has shown that its level is still not ideal for readers with limited language literacy [36]. Other initiatives attempting to bridge this gap aim at making access to content in one’s local language by growing under-represented languages such as Scribe [237], the GapFinder tool [221], Content Translation [218], or the Growing Local Language Content on Wikipedia initiative [223].
(p.11)
From A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft), within the CONTRIBUTORS section:
4.1.4 Language. The language gap is the difference between an individual’s fluency in a language and how likely they are to contribute to Wikimedia sites. Surveys have been conducted to estimate contributors’ literacy or language skills [21, 52, 79, 131, 69, 119] and the Babel system [255] is widespread on user talk pages and offers an alternative to understanding the fluency of contributors. Though it may feel intuitive that fluency would be required to contribute, lowering the barrier to contribution by lower-fluency individuals can be important for effective patrolling in small wikis [155], increase the diversity of contributors, and allow for the cross-pollination of content that might otherwise remain locked up in other languages [55]. Many editors are multilingual and contribute to Wikipedia in a variety of languages, with small wikis heavily depending on multilingual editors and English the most common second-language outside of one’s native language [69, 55, 52]. While reducing language barriers is important, it also brings risks of larger communities overshadowing the contributions of more local contributors as happened recently with Scots Wikipedia [171]. Tools like Scribe [237] have sought to address the language gap by making it easier to contribute in one’s own language even when there are not easier approaches to writing articles like Content Translation [218] available.
(p.17)
From A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft), within the CONTENT section:
5.1.4 Language. The language gap refers to the difference in content coverage across different languages. While each Wikipedia language edition is a stand-alone project, with different size and coverage of relevant topics [88, 30], other projects such as Wikidata and Wikimedia commons are multilingual by design. However, while Wikimedia Commons is used across many languages [122], its captions and descriptions area available mainly in English. Wikidata’s labels are also nonuniformly distributed across languages, with only 11 languages holding almost 50% of all language knowledge in Wikidata, English being one of the most prominent ones [103]. Projects such as Structured Data on Commons [238] and Suggested Edits [230] aimed at rehauling the projects’ interface to make the translation efforts on Commons easier and more effective.
(p.23)
Weekly update:
Weekly update:
Weekly update:
Final update:
Finished proposals for new metrics