Page MenuHomePhabricator

Import data from the Monumenten Inventarisatie Project (MIP)
Open, Needs TriagePublic

Description

The Monumenten Inventarisatie Project (MIP) is a database of 152.400 Dutch buildings from the period 1850 - 1940. The inventory of this database was created between 1986 and 1995. It was released in 2011 under a CC-zero license. I've already converted the original Microsoft Access database to CSV, JSON and Excel (check under rce_mip_objecten_mdb).

This could be an interesting datasource for a Wikidata project.

Event Timeline

Husky created this task.Oct 27 2018, 12:58 PM
1Veertje added a comment.EditedOct 29 2018, 9:36 AM
    • was Used for Den Bosch, Geldermalsen, Nieuwegein, Oosterhout, Sliedrecht, Staphorst en Waalwijk
    • Heerenveen, Zaltbommel, Rotterdam, Neder-Betuwe, Middelburg, Leiden, Lingewaal, Roerdalen, Lingewaal, Roerdalen, Breda, Alblasserdam, Voorschoten, Zwolle, Cuijk, Bergen
Husky added a comment.Nov 6 2018, 10:09 PM

@1Veertje: i downloaded your SQL files and converted them to a CSV dump (see attachment). However, i think maybe something went wrong because i don't see many improvements. Many of the streetnames seem to be truncated because of a VARCHAR limit, e.g. rijnr 8895 now has 'Burgemeester van Dobben de Bru' instead of 'Burgemeester van Dobben de Bruijnstraat'. i've also included a Jupyter Notebook with the zip file, maybe you can see for yourself if i missed something.

Husky added a comment.Nov 7 2018, 3:56 PM

When looking closer at this dataset i doubt it's going to be very useful. There's a couple of reasons for that:

  • Matching is going to be *hard*. There is no unique identifier (like Rijksmonumenten ID) so we need other datapoints, like zip code. Unfortunately, of the 60.000 or so Rijksmonumenten on Wikidata only a really small minority (less than 1%) has postal codes. Matching on street names could be another option, but is obviously harder and more error prone.
  • Only 17% of the >150.000 items in this dataset have a name.
  • Some of the more interesting data points, like architect and building style are already available in the current Rijksmonumentenregister, usually more extensive. Little of that data hasn't been transferred to Wikidata yet, but it makes far more sense to put effort into that than in transferring old limited data.

So i'm going to stall my efforts in trying to use this data for an import on Wikidata.

The matching could be done based on RD coordinates (x_coord, y_coord). For example, the first entry has RD coordinates x=247988m y=443906m (= 51°58'32.8"N 6°44'26.2"E in WGS84). The corresponding coordinates in the Rijksmonumentendatabase (https://cultureelerfgoed.nl/monumenten/526018) are
x=247999m y=443876m -- not exactly the same, but note that the address is also different (Vredenseweg 102 vs Bataafseweg 18).

Of course, that's not much use if the data itself is not interesting -- but it might be an option for other data sets.

Husky added a comment.Nov 7 2018, 8:24 PM

@valhallasw that could be an interesting option, but would it also work if there are many monuments that are close to each other, like in the centre of Amsterdam?

But indeed, i think the data itself is not interesting enough to put in the effort.

That depends... I imagine the errors will be smaller in Amsterdam (I think in the case of this windmill, one database points to the windmill itself and one to the street).

If I take the following as an example:

254,509,MIPobj509,11307.00,Noord-Holland,Amsterdam,Amsterdam,Vondelstraat,1054 GT,136,138-158,,,,Concertgebouwbuurt / Vondelpar,Woonhuizen,Eclecticisme,,,,,Circa 1880-1890,,119912,486022,,"Bouwkunst; woonhuis""

we find coordinates 119912,486022 = N 52 21.658, E 4 52.326 (via https://www.gpscoordinaten.nl/converteer-rd-coordinaten.php), which is Vondelstraat 144 according to Google Maps. That seems consistent with the 138-158 range (assuming 'hno' links multiple house numbers -- I'm a bit confused why hne also contains something housenumber-y).

Location and address seem to map to Rijksmonumenten entry https://cultureelerfgoed.nl/monumenten/5908 - but it's not! The MIP database discusses a house while this RM entry is a manege.

And then there is also the .gml dump of the MIP database, which lists yet another RD coordinate: <gml:pos>119978 486099</gml:pos>. This dump also notes that <imkich:kenmerkendheid>lokaal</imkich:kenmerkendheid>, which might explain why it doesn't show up on itself in the RM database.

The manege is actually MIPobj2132 from the MIP database (matched by street, number and most importantly: name) That one has <gml:pos>119905 486073</gml:pos> which is again different from all earlier values.

So... no, it will probably be a mess in cities. Too bad -- one would think triangulated coordinates would work well for this :(

Husky added a comment.Nov 7 2018, 10:28 PM

@valhallasw thanks for trying out a couple of things, and indeed confirming that using the location won't work. I was pretty confused by the HNO / HNE columns as well, unfortunately there doesn't seem to be any kind of description of what those fields actually mean.

But at least i now know who to ask whenever i have a database with weird coordinates again ;)