Page MenuHomePhabricator

Investigate connected images in KMB
Closed, ResolvedPublic

Description

Investigate how many of the imags in KMB (Kulturmiljöbild) have:

  1. A link to an object in either BeBR or FMIS
  2. Are linked to from an object in BeBR or FMIS
    • there are none from FMIS
  3. Are already uploaded to Commons (independent of filesize)

Event Timeline

Out of the 237,355 images in KMB:

  • There are roughly 10,000 images linking to BeBR from KMB.
  • There are roughly 2,150 images on Commons linking to kmb via kmb.raa.se links
  • There are roughly 1,000 images on Commns linking to kmb via kulturarvsdata.se links

Note that the latter two may overlap. Note also that they need not overlap with the first.

Old (and dirty) code for matching KMB metadata to Commons templates can be found in the RAA-tools repo.

Found old mapping tables at commons:User:Lokal_Profil/nycklar/... which can be used as a basis for further mapping.

Have also made a first stab at using ksamsok_py and my old appspot tool to see if any other data needs to be extracted from the RAA-record for an image.

ksamsok_py is base is info on the xml representation whereas the old appspot tool uses the rdf representation. Since the rdf contains unique identifiers for civil parishes and municipalities it is likely the better way to go.

Currently running a small script to get the parsed data from the appspot tool for the 9850 linked images. This will allow me to uppdate/extend the mapping tables for keywords and Creators.

For the 9850 images linked to a BBR or FMIS entry
I've isolated the tags ( itemKeyWord + itemClassName ) and photographers (byline) with their frequencies in P4867.

While doing this I also encountered the following issues:

Also:

  • Every image has at least one tag
  • 13 images lack a motive
  • 521 images lack a description (no overlap with the previous 13)

Photographers are getting creator templates through the list at: https://commons.wikimedia.org/wiki/Institution:Riksantikvarie%C3%A4mbetet/KMB/creators

Regarding the issues:

  • The 7 entries with cz have been reported to KMB.
  • The 2 entries without municipalities have been reported to KMB.

Looking at Wikidata coverage of P777 (socken/ATA) codes I've put together P4871 which identifies the missing code on Wikidata (as compared to Commons).

The codes on Wikidata but not on Commons should not be a source for worries.


Here's a new set of triples, his time with licensing information. For all KMB images which have an assigned license and depict one or more ancient monuments or historic buildings, the corresponding license and depicted heritage assets are given.
There are c.1500 fewer images in this file than in the previous one, representing images in KMB with no assigned licensing information (must be assumed to be all rights reserved).

@Carwash For the last 5ish years (since my stint as Wikimedia Developer in Residence) http://kmb.raa.se/cocoon/bild/bildanvandning.html has been interpreted to mean that if copyright=RAÄ then commercial use is allowed and CC BY should be assumed (with credit given to "<photographer> / Riksantikvarieämbetet"). For any with another value in copyright and no license it was assumed that the image was unfree.

Is that assumption no longer valid?

@Lokal_Profil I'm still waiting on confirmation of this from the archive, but an interim - if perhaps glib - answer is that all the images we have which are marked as CC-BY or PD have been released under those licenses after having been cleared by the archive; images without such a license have not been so cleared (yet).

@Lokal_Profil I've now received a reply from the archives: They have an ongoing clearing project for archival material which covers CC-BY licensing of photographs, among other things. For images still within copyright, this entails getting permission from the photographers to license their photos under CC-BY, and there is always the possibility that they will refuse, so CC-BY cannot be assumed. With the possible exception of some photographs of drawings, all the material that can be released with PD-mark has been released. There are also a number of undated photographs, and it remains to be seen how they will be handled.
I hope that helps to clarify things.

@Lokal_Profil I've now received a reply from the archives: They have an ongoing clearing project for archival material which covers CC-BY licensing of photographs, among other things. For images still within copyright, this entails getting permission from the photographers to license their photos under CC-BY, and there is always the possibility that they will refuse, so CC-BY cannot be assumed. With the possible exception of some photographs of drawings, all the material that can be released with PD-mark has been released. There are also a number of undated photographs, and it remains to be seen how they will be handled.
I hope that helps to clarify things.

Thanks for the update and clarification. Will use your new list as a basis for the first upload and will filter any subsequent selections on the license rather than copyright tag.

Might be a good point to also clarify http://kmb.raa.se/cocoon/bild/bildanvandning.html (which was likely written before the license field was introduced).

Might be a good point to also clarify http://kmb.raa.se/cocoon/bild/bildanvandning.html (which was likely written before the license field was introduced).

Thanks, I'll pass that on.

By the way, the object listed as fmis:fd856902 should really be fmis:10008501160001
This error has been reported and will hopefully be rectified soon; the same goes for some other errors in the data which @Ainali has reported.

Regarding the issues:

  • The 7 entries with cz have been reported to KMB.
  • The 2 entries without municipalities have been reported to KMB.

These have been fixed.

The images marked Lundberg/Hildebrand have been reported and are about to be fixed too.

By the way, the object listed as fmis:fd856902 should really be fmis:10008501160001

This has now been corrected at KMB: http://kmb.raa.se/cocoon/bild/show-image.html?id=16000300012962

Might be a good point to also clarify http://kmb.raa.se/cocoon/bild/bildanvandning.html (which was likely written before the license field was introduced).

Thanks, I'll pass that on.

By the way, the object listed as fmis:fd856902 should really be fmis:10008501160001
This error has been reported and will hopefully be rectified soon; the same goes for some other errors in the data which @Ainali has reported.

Thanks!

Also making a note here (for later) that the mapping related to P4871 (100% coverage of P777) is complete thanks to @Ainali.

Regarding the image links to Kulturmiljöbild in the data provided which are now (May 2017) returning 404 errors (as documented in T165277) the explanation from our archives division is as follows:

In accordance with the ongoing copyright and licensing clearance work outlined above, the images have been removed because a licence agreement had not been reached with the photographer. If and when such a licence is agreed upon (or the copyright expires) the images can be made public again.

This, however, does not explain why, when the data was harvested in February, the images had PD-licences in their metadata. Which concerns me. Awaiting clarification from the archives division.

UPDATE: Archives suggest that the images in question may have previously been misclassified as PD, an error which has in that case now been remedied. Unfortunately, the remedy is not in our favour.

@Carwash
Thanks for investigating. From this projects point of view that resolves the issue.

From my personal interest point of view. Why delete the whole record (with it's persistent identifier!) rather than just changing the license in the data?

@Lokal_Profil The records have not been deleted per se, but they have been marked as being no longer publicly available. I strongly agree that removing the URI and the metadata as well is not a good outcome. I will look into what we can do about that, but if I understand the systems involved it may not be possible for KMB to make the metadata record only, and not the image, available for harvesting by SOCH - I think it's all or nothing. :(

Update:

@Lokal_Profil @Ainali
The last cache of triples – the raa-kmb-licenses.ttl file uploaded above – was harvested on 2017-02-02. I've now run a new harvest of triples, dated 2017-06-27, which includes a large number of addtional images which had not been licensed in February. It also includes a small number of updates and deletions; these are described below and I would recommend reading what has changed and why.

In order to facilitate comparing caches, I have converted the original upload to its Canonical N-Triples form, which I attach here, along with a diff against the new cache from 2017-02-02, detailing what has changed.

If it's convenient for you, I will provide any future updates in diff format using Canonical N-Triples serialisation. If this is not useful, please let me know.

Here follow some notes about what has changed since February, and the reasons for those changes:

Additions:

The overwhelming majority of the changes concern additions since 2nd February 2017. In most cases this is likely to be due to the ongoing clearing work being done to assign correct licenses to images in the KMB catalogue. Approximately 1423 images have been added.

Updates:

The following triples have changed:

338c384
< <http://kulturarvsdata.se/raa/kmb/16000300012962> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/fd856902> .
---
> <http://kulturarvsdata.se/raa/kmb/16000300012962> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10008501160001> .

In this case, the FMIS-id was invalid, and has been fixed at the source. You may recall that this was addressed earlier.


12982c15627
< <http://kulturarvsdata.se/raa/kmb/16001000095544> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10100900120001> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000095544> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10099800120001> .
12984c15629
< <http://kulturarvsdata.se/raa/kmb/16001000095570> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10100900120001> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000095570> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10099800120001> .
12986c15631
< <http://kulturarvsdata.se/raa/kmb/16001000095572> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10100900120001> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000095572> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10099800120001> .
12988c15633
< <http://kulturarvsdata.se/raa/kmb/16001000095574> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10100900120001> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000095574> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10099800120001> .
12990c15635
< <http://kulturarvsdata.se/raa/kmb/16001000095576> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10100900120001> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000095576> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10099800120001> .
12992c15637
< <http://kulturarvsdata.se/raa/kmb/16001000095578> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10100900120001> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000095578> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10099800120001> .
12994c15639
< <http://kulturarvsdata.se/raa/kmb/16001000095580> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10100900120001> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000095580> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10099800120001> .

In this case, a series of seven images that were thought to depict lämning Ramdala 12:1 have been corrected to reflect that fact that they actually depict Jämjö 12:1.


16453c19192
< <http://kulturarvsdata.se/raa/kmb/16001000534540> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21000000981008> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000534540> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000014988> .
16456c19195
< <http://kulturarvsdata.se/raa/kmb/16001000534542> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21000000981008> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000534542> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000014988> .
16462c19201
< <http://kulturarvsdata.se/raa/kmb/16001000534552> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21000000981008> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000534552> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000014988> .

In this case, Brahehuset in Ystad has changed its identifier in BeBR (from http://kulturarvsdata.se/raa/bbr/21000000981008 to http://kulturarvsdata.se/raa/bbr/21300000014988) and the three images depicting it have been updated accordingly. It is, of course, unpardonable that an object should change its permanent URI; however, this is, unfortunately, a known and very embarassing issue with the current version of BeBR which we are working to mitigate.


16722c19513
< <http://kulturarvsdata.se/raa/kmb/16001000538560> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538560> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16724c19515
< <http://kulturarvsdata.se/raa/kmb/16001000538562> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538562> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16726c19517
< <http://kulturarvsdata.se/raa/kmb/16001000538564> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538564> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16728c19519
< <http://kulturarvsdata.se/raa/kmb/16001000538566> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538566> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16730c19521
< <http://kulturarvsdata.se/raa/kmb/16001000538568> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538568> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16732c19523
< <http://kulturarvsdata.se/raa/kmb/16001000538570> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538570> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16734c19525
< <http://kulturarvsdata.se/raa/kmb/16001000538574> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538574> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16736c19527
< <http://kulturarvsdata.se/raa/kmb/16001000538576> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538576> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16738c19529
< <http://kulturarvsdata.se/raa/kmb/16001000538578> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538578> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16740c19531
< <http://kulturarvsdata.se/raa/kmb/16001000538580> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538580> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16742c19533
< <http://kulturarvsdata.se/raa/kmb/16001000538582> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538582> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16744c19535
< <http://kulturarvsdata.se/raa/kmb/16001000538584> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538584> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16746c19537
< <http://kulturarvsdata.se/raa/kmb/16001000538586> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538586> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16748c19539
< <http://kulturarvsdata.se/raa/kmb/16001000538588> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538588> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .
16750c19541
< <http://kulturarvsdata.se/raa/kmb/16001000538590> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/1300000013542> .
---
> <http://kulturarvsdata.se/raa/kmb/16001000538590> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21300000013542> .

In this case, Näsby Slott in Täby has changed its identifier in BeBR (from http://kulturarvsdata.se/raa/bbr/1300000013542 to http://kulturarvsdata.se/raa/bbr/21300000013542) and the fifteen images depicting it have been updated accordingly. Explanation and apologies as above. :(


Deletions:

The following triples have been removed:

10629,10632d12058
< <http://kulturarvsdata.se/raa/kmb/16001000011191> <http://kulturarvsdata.se/ksamsok#mediaLicense> <http://creativecommons.org/licenses/by/2.5/> .
< <http://kulturarvsdata.se/raa/kmb/16001000011191> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21400000209820> .
< <http://kulturarvsdata.se/raa/kmb/16001000011193> <http://kulturarvsdata.se/ksamsok#mediaLicense> <http://creativecommons.org/licenses/by/2.5/> .
< <http://kulturarvsdata.se/raa/kmb/16001000011193> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/bbr/21400000209820> .

These triples have been removed because the subject of the photographs was misattributed: they do not, in fact, depict the buildings they were thought to depict in February. Nb that the licenses for the two images remain unchanged, so there is no need to remove them from Commons.


16131,16132d18869
< <http://kulturarvsdata.se/raa/kmb/16001000512964> <http://kulturarvsdata.se/ksamsok#mediaLicense> <http://creativecommons.org/publicdomain/mark/1.0/> .
< <http://kulturarvsdata.se/raa/kmb/16001000512964> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10153200700001> .
16149,16150d18885
< <http://kulturarvsdata.se/raa/kmb/16001000513184> <http://kulturarvsdata.se/ksamsok#mediaLicense> <http://creativecommons.org/publicdomain/mark/1.0/> .
< <http://kulturarvsdata.se/raa/kmb/16001000513184> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10156400120001> .
16159,16160d18893
< <http://kulturarvsdata.se/raa/kmb/16001000513364> <http://kulturarvsdata.se/ksamsok#mediaLicense> <http://creativecommons.org/publicdomain/mark/1.0/> .
< <http://kulturarvsdata.se/raa/kmb/16001000513364> <http://kulturarvsdata.se/ksamsok#visualizes> <http://kulturarvsdata.se/raa/fmi/10157000470001> .

These three images appear to have been removed from the public KMB website, and thus also from K-samsök. I can only assume that it has, since February, been determined that there are not quite as Public Domain as we had previously thought. You may therefore wish to consider removing them from Commons.

Many thanks for the update.

The diff format will work fine for the future =)

The overwhelming majority of the changes concern additions since 2nd February 2017. In most cases this is likely to be due to the ongoing clearing work being done to assign correct licenses to images in the KMB catalogue. Approximately 1423 images have been added.

Thanks. I'll trigger a new upload for these images.

In this case, the FMIS-id was invalid, and has been fixed at the source. You may recall that this was addressed earlier.

These were fixed before the upload so all is well.


In this case, a series of seven images that were thought to depict lämning Ramdala 12:1 have been corrected to reflect that fact that they actually depict Jämjö 12:1.

These were fixed before the upload started so the uploaded data is correct.


In this case, Brahehuset in Ystad has changed its identifier in BeBR (from http://kulturarvsdata.se/raa/bbr/21000000981008 to http://kulturarvsdata.se/raa/bbr/21300000014988) and the three images depicting it have been updated accordingly. It is, of course, unpardonable that an object should change its permanent URI; however, this is, unfortunately, a known and very embarassing issue with the current version of BeBR which we are working to mitigate.

Unpersisting persistent identifiers makes me sad =(
But these were also fixed before the upload started so no need for action on our end (also checked Wikidata to ensure it's ok there)


In this case, Näsby Slott in Täby has changed its identifier in BeBR (from http://kulturarvsdata.se/raa/bbr/1300000013542 to http://kulturarvsdata.se/raa/bbr/21300000013542) and the fifteen images depicting it have been updated accordingly. Explanation and apologies as above. :(

These were also fixed before the upload started so no need for action on our end (also checked Wikidata to ensure it's ok there)


These triples have been removed because the subject of the photographs was misattributed: they do not, in fact, depict the buildings they were thought to depict in February. Nb that the licenses for the two images remain unchanged, so there is no need to remove them from Commons.

I've marked these files for renaming and updated their metadata.


These three images appear to have been removed from the public KMB website, and thus also from K-samsök. I can only assume that it has, since February, been determined that there are not quite as Public Domain as we had previously thought. You may therefore wish to consider removing them from Commons.

These were removed before the upload started and so were never uploaded.

Many thanks for the update.

No problem. :)

The diff format will work fine for the future =)

Excellent. How frequently would you like updates, and for how long? Once every six months, or more/less often?

Unpersisting persistent identifiers makes me sad =(

It is a right pain in the arse. And quite an embarrassing bug for us. Unfortunately the causes are built in to how the BeBR system was designed, so it may not be easy to fix.