Page MenuHomePhabricator

Analyze African libraries dataset
Closed, ResolvedPublic


A new dataset was added recently by a volunteer, which looks extremely interesting:

CC0 license and data from several countries.

Have a look at it to see how useful it is to us and what a reasonable plan to work with it would be.

Event Timeline

Jopparn added a subscriber: Sebastian_Berlin-WMSE.
Jopparn added a subscriber: Jopparn.

I would suggest that Sebastian take the lead on this.

Alicia_Fagerving_WMSE added a comment.EditedMay 28 2019, 7:29 AM

I don't see any tricky points with this dataset, so I think it's a good idea. It's actually been updated in the last few weeks, so there's ca 190 items in it now. Still not a lot, considering it's from several countries (I'm counting at least 6), but it's a good start. The data itself is not very complicated, it's basic info like name/town/country/coordinates, and I don't see a lot of things that need to be cleaned up (maybe some all-caps strings). Most of these, I wouldn't be surprised if all, don't exist on Wikidata, so no tricky matching. Technically it looks like it can be loaded into OpenRefine and processed without major hiccups.

I've managed to reconcile country and library type for all but one and location for over half. Coordinate location was easy enough to get by just concatenating two columns.

Street address could also be added, though these are formatted quite inconsistently, you have e.g. "10 Hospital Road", "GA-563-6516" and "50317 Chavakali". There are contact information and e-mail addresses, but again quite inconsistent: sometimes they refer to a person and sometimes a library.

Lastly there are three different ids: two that appear to be specific for this database, "OBJECTID" and "OBJECTID_1", and one that that is some kind of hash named "GlobalID". I haven't found any further information about the ids.

Also, only five of the library names where matched when reconciling.