Page MenuHomePhabricator

Import data from the Monumenten Inventarisatie Project (MIP)
Closed, DeclinedPublic

Description

The Monumenten Inventarisatie Project (MIP) is a database of 152.400 Dutch buildings from the period 1850 - 1940. The inventory of this database was created between 1986 and 1995. It was released in 2011 under a CC-zero license. I've already converted the original Microsoft Access database to CSV, JSON and Excel (check under rce_mip_objecten_mdb).

This could be an interesting datasource for a Wikidata project.

Event Timeline

    • was Used for Den Bosch, Geldermalsen, Nieuwegein, Oosterhout, Sliedrecht, Staphorst en Waalwijk
    • Heerenveen, Zaltbommel, Rotterdam, Neder-Betuwe, Middelburg, Leiden, Lingewaal, Roerdalen, Lingewaal, Roerdalen, Breda, Alblasserdam, Voorschoten, Zwolle, Cuijk, Bergen

@1Veertje: i downloaded your SQL files and converted them to a CSV dump (see attachment). However, i think maybe something went wrong because i don't see many improvements. Many of the streetnames seem to be truncated because of a VARCHAR limit, e.g. rijnr 8895 now has 'Burgemeester van Dobben de Bru' instead of 'Burgemeester van Dobben de Bruijnstraat'. i've also included a Jupyter Notebook with the zip file, maybe you can see for yourself if i missed something.

When looking closer at this dataset i doubt it's going to be very useful. There's a couple of reasons for that:

  • Matching is going to be *hard*. There is no unique identifier (like Rijksmonumenten ID) so we need other datapoints, like zip code. Unfortunately, of the 60.000 or so Rijksmonumenten on Wikidata only a really small minority (less than 1%) has postal codes. Matching on street names could be another option, but is obviously harder and more error prone.
  • Only 17% of the >150.000 items in this dataset have a name.
  • Some of the more interesting data points, like architect and building style are already available in the current Rijksmonumentenregister, usually more extensive. Little of that data hasn't been transferred to Wikidata yet, but it makes far more sense to put effort into that than in transferring old limited data.

So i'm going to stall my efforts in trying to use this data for an import on Wikidata.

The matching could be done based on RD coordinates (x_coord, y_coord). For example, the first entry has RD coordinates x=247988m y=443906m (= 51°58'32.8"N 6°44'26.2"E in WGS84). The corresponding coordinates in the Rijksmonumentendatabase (https://cultureelerfgoed.nl/monumenten/526018) are
x=247999m y=443876m -- not exactly the same, but note that the address is also different (Vredenseweg 102 vs Bataafseweg 18).

Of course, that's not much use if the data itself is not interesting -- but it might be an option for other data sets.

@valhallasw that could be an interesting option, but would it also work if there are many monuments that are close to each other, like in the centre of Amsterdam?

But indeed, i think the data itself is not interesting enough to put in the effort.

That depends... I imagine the errors will be smaller in Amsterdam (I think in the case of this windmill, one database points to the windmill itself and one to the street).

If I take the following as an example:

254,509,MIPobj509,11307.00,Noord-Holland,Amsterdam,Amsterdam,Vondelstraat,1054 GT,136,138-158,,,,Concertgebouwbuurt / Vondelpar,Woonhuizen,Eclecticisme,,,,,Circa 1880-1890,,119912,486022,,"Bouwkunst; woonhuis""

we find coordinates 119912,486022 = N 52 21.658, E 4 52.326 (via https://www.gpscoordinaten.nl/converteer-rd-coordinaten.php), which is Vondelstraat 144 according to Google Maps. That seems consistent with the 138-158 range (assuming 'hno' links multiple house numbers -- I'm a bit confused why hne also contains something housenumber-y).

Location and address seem to map to Rijksmonumenten entry https://cultureelerfgoed.nl/monumenten/5908 - but it's not! The MIP database discusses a house while this RM entry is a manege.

And then there is also the .gml dump of the MIP database, which lists yet another RD coordinate: <gml:pos>119978 486099</gml:pos>. This dump also notes that <imkich:kenmerkendheid>lokaal</imkich:kenmerkendheid>, which might explain why it doesn't show up on itself in the RM database.

The manege is actually MIPobj2132 from the MIP database (matched by street, number and most importantly: name) That one has <gml:pos>119905 486073</gml:pos> which is again different from all earlier values.

So... no, it will probably be a mess in cities. Too bad -- one would think triangulated coordinates would work well for this :(

@valhallasw thanks for trying out a couple of things, and indeed confirming that using the location won't work. I was pretty confused by the HNO / HNE columns as well, unfortunately there doesn't seem to be any kind of description of what those fields actually mean.

But at least i now know who to ask whenever i have a database with weird coordinates again ;)

Ecritures triaged this task as Medium priority.
Ecritures moved this task from Backlog to SPARQLstation on the Wiki-Techstorm-2019 board.
Ecritures removed subscribers: Multichill, Husky.
Multichill added subscribers: Multichill, Husky.

Should we still do this? See my previous comment. I doubt this dataset is very useful.

Husky changed the task status from Open to Stalled.Nov 5 2019, 6:20 PM
Husky lowered the priority of this task from Medium to Low.
Ecritures changed the task status from Stalled to Open.Nov 15 2019, 10:03 PM
Ecritures raised the priority of this task from Low to Medium.
Spinster changed the task status from Open to Stalled.Nov 18 2019, 2:22 PM
Spinster lowered the priority of this task from Medium to Low.
Spinster subscribed.
  • Some of the more interesting data points, like architect and building style are already available in the current Rijksmonumentenregister, usually more extensive. Little of that data hasn't been transferred to Wikidata yet,

With my art historian's hat on, I'd really like it if that were the case, and I will be happy to provide input to any effort that helps to add that kind of data. The MIP dataset doesn't seem to be the way to go though, so I endorse giving priority to the fresh RCE data.

One last thought: could Mix'n'match be of any help with MIP matching?

@Spinster, i don't think so. There's no unique identifier, and the data itself is pretty meagre compared to the Rijksmonumenten database. Let's not waste more effort in doing something with this dataset.