Page MenuHomePhabricator

[Story] Identify Sister City inconsistancies
Closed, ResolvedPublic

Description

A few quick checks demonstrate that the sister cities listed for cities in the Wikipedias and in Wikidata contain inconsistancies. A Bachelor's project at the HTW Berlin will be looking at identifying such inconsistancies and setting up scripts that can be chosen by a human to correct Wikidata and replace the Wikipedia entries with Wikidata information.

Event Timeline

Is there any update about this?

Yes, Tobias successfully defended his thesis on Tuesday! He has the list produced here: https://tools.wmflabs.org/sistercities/

I chose a few of them to look at, and one can quickly see the problems: We really need SOURCES for determining what is the true sister city. For example, there are two Halifaxes, and often the wrong one is entered. The parsing of the Wikipedia entries is error-prone, as there are only four (!) cities that use the sister city template, there are lots of different ways that cities have the sister cities documented. It turns out that cities also have sister ships and sister military regiments, etc., but there is no concept for this in Wikidata :) And then we have the problem of a city having a sub-region with the same name, and they have different sister cities.

It is a truely wonderful example of how complicated Real Life (tm) is.

During the thesis defense we made some suggestions to Tobias as to how he could improve the tool. I'll ask him if he can post his thesis somewhere and the link to his code.

Tbscho added a subscriber: Tbscho.Apr 9 2017, 1:41 PM

Hey it's me, Tobias :)
I posted my thesis on my service: https://wikidata.display.name/tobias_scholz_535068_BA_thesis.pdf

The code for parsing the data structure out of Wikipedia/Wikidata and the list frontend can be reached at github: https://github.com/displayn/sistercities
Please feel free to port it from github to phabricator.

Lydia_Pintscher closed this task as Resolved.Apr 7 2018, 11:10 AM