Page MenuHomePhabricator

topic overlap between Wikipedia language versions
Open, Needs TriagePublic


The different language Wikipedias cover very different topics in their articles. With the sitelinks on Wikidata we have data to analyze this further. It'd be useful to have an overview of the overlap of articles between the different language versions of Wikipedia. We want to make the result of this actionable.

This could look something like this:

not covered in enwpnot covered in dewpnot covered in frwp
enwp articles-1050
dewp articles42-12
frwp articles15150-

Each cell could then link to a list of missing topics to make it actionable. Preferably the list would be ordered by the number of other Wikipedias that cover the topic.


  • We should make it clear that there are good reasons for some topics not being covered in a Wikipedia and it is not always necessary to create a new article. These reasons can include:
    • the topic is not considered notable for that Wikipedia
    • the topic is covered but as a paragraph in another article for example
  • Later this could be expanded to the other Wikimedia projects.

See also:
T200859: Add "haswbsitelink" to find items missing in a certain wiki
T236992: Order Wikidata search result by number of statements/labels/sitelinks/identifiers

Event Timeline

@Lydia_Pintscher @Manuel @WMDE-leszek

Before we proceed with this, please take a look at our WDCM Sitelinks Dashboard:

  • Wiki View tab and then
    • Wiki Similarity

I would say that the similarity graph presented there is pretty close to what you are looking for.

Maybe we should just think about extending the functionality of this WDCM system component instead of going for a new data product?