Page MenuHomePhabricator

WD Languages Landscape: statistics + dashboards
Closed, ResolvedPublic

Description

  • Collect fundamental statistics from the external sources for the Wikidata Languages Landscape.
  • Develop reports, visualizations, and dashboards for the languages project.

Event Timeline

@Lydia_Pintscher @RazShuty

Something to begin with:

  • each node is a language (Wikimedia language codes are used);
  • each language points towards the three most similar languages to it,
  • in terms of the overlap in the respective language labels across >57M Wikidata items:
  • (explanation: for each language we search what WD items have a label in it,
  • then: similarity between two languages == Jaccard distance between two binary vectors of length approx. 57M each).

Mapping WDCM item re-use statistics onto languages now.

GoranSMilovanovic renamed this task from WD Languages Landscape: fundamental statistics to WD Languages Landscape: statistics + dashboards.Oct 13 2019, 9:47 PM
GoranSMilovanovic updated the task description. (Show Details)
GoranSMilovanovic added a subscriber: WMDE-leszek.

@Lydia_Pintscher
You can take a look at our WikidataCon2019 shared doc and see if you can make use of anything from the Wikidata Languages Landscape: Statistics and Visualizations section.