Decide how the WD items should be sourced
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Alicia_Fagerving_WMSE
	Jan 13 2017, 12:46 PM

Description

When adding claims to Wikidata items, it would be great to source (ideally) each and every one of them. As I see it, there are two options:

Use Wikipedia as a source. This is less than ideal in terms of being a reliable source, but it's straightfoward to do (and while it's not recommended, it's common practice, for better or worse). Every item in the WLM database contains information about the Wikipedia page it was fetched from, for example //sv.wikipedia.org/w/index.php?title=Lista_%C3%B6ver_arbetslivsmuseer_i_Blekinge_l%C3%A4n&oldid=30834404. As it includes the page revision id, it's easy to link to the correct version.
Use the registrant_url value. For example, in the Norwegian building data, each item has an url pointing the the Kulturminnesøk service. The advantage is that it's a reliable, official data source. There are two disadvantages:
1. The WLM database comprises data downloaded from Wikipedia pages, which have been edited by the community. There's no way of knowing which information is supported by the registrant_url and which was added manually by someone.
2. Many of the data sets don't even have a registrant_url.

In the end, pointing back to the Wikipedia page certainly seems better than nothing. In some cases we do know where the data on Wikipedia came from _originally_ -- such as the Swedish museum dataset -- but again, there's always a possibility that the Wikipedia page contains info added by someone manually. The WLM database is updated continuously, so it contains the freshest dump of whatever is included in the Wikipedia page. This makes it tricky to guess which statements are supported by the "official" sources.

Related Objects

Mentioned In: T156741: Create item for the ArbetSam database
T158049: Identify steps involved in migrating a dataset
T156108: Decide how to deal with datasets without any external references

Event Timeline

Alicia_Fagerving_WMSE created this task.Jan 13 2017, 12:46 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 13 2017, 12:46 PM

Jopparn assigned this task to Lokal_Profil.Jan 21 2017, 4:02 AM

Alicia_Fagerving_WMSE mentioned this in T156108: Decide how to deal with datasets without any external references.Jan 24 2017, 7:27 AM

There is a third alternative which is to ource it to the monuments databse itself (imported from: WLM database) (with country+id+date as extra info). This ends up being largely equivalent to imported from: Wikipedia.

I would say that as a default we use:
imported from: Wikipedia
url: permalink.
(this would of course be ommitted as soon as we have a better reference)

For individual datasets we could spend a little time investigating (a separate task per batch), or asking relevant communities to investigate, which properties could be sourced better (be it through registrar_url or some other type of reference. True that they may have been edited since the initial import, but then that is also through for normal data on Wikidata.

Jopparn moved this task from Backlog to In process on the Connected-Open-Heritage-Wikidata-migration board.Jan 30 2017, 10:39 AM

Restricted Application added a subscriber: jhsoby. · View Herald TranscriptJan 30 2017, 10:39 AM

An outcome of this task would be to create a flow chart of the different steps.

Note that as per https://www.wikidata.org/wiki/Wikidata:Bots, each and every statement added using a bot account will have to be sourced.

Alicia_Fagerving_WMSE added a subscriber: Lokal_Profil.Jan 30 2017, 11:37 AM

First step: created an item for the Monuments database https://www.wikidata.org/wiki/Q28563569

Lokal_Profil added a comment.Feb 1 2017, 5:34 PM

This comment was removed by Lokal_Profil.

In T155241#2984796, @Alicia_Fagerving_WMSE wrote:

First step: created an item for the Monuments database https://www.wikidata.org/wiki/Q28563569

Is this to be used together with P143? (when no better sources are available)
If so it should be used (as a compared part of the reference) together with:

[[ https://www.wikidata.org/wiki/Property:P813 | P813 ]]= today (as a non-compared part of the reference)
[[ https://www.wikidata.org/wiki/Property:P854 | P854 ]] = https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=<namespace>&srlanguage=<lang>&srid=<unique_id> where e.g <namespace>= se-bbr, <lang> = sv and <unique_id>=21000001001755 (as a compared part of the reference)

Alicia_Fagerving_WMSE mentioned this in T158049: Identify steps involved in migrating a dataset.Feb 20 2017, 9:20 AM

Lokal_Profil mentioned this in T156741: Create item for the ArbetSam database.Feb 23 2017, 4:02 PM

Lokal_Profil added a project: User-LokalProfil.Mar 21 2017, 5:45 PM

In T155241#2990449, @Lokal_Profil wrote:

In T155241#2984796, @Alicia_Fagerving_WMSE wrote:

First step: created an item for the Monuments database https://www.wikidata.org/wiki/Q28563569

Is this to be used together with P143? (when no better sources are available)
If so it should be used (as a compared part of the reference) together with:

[[ https://www.wikidata.org/wiki/P813 | P813 ]]= today (as a non-compared part of the reference)

[[ https://www.wikidata.org/wiki/P854 | P854 ]] = https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=<namespace>&srlanguage=<lang>&srid=<unique_id> where e.g <namespace>= se-bbr, <lang> = sv and <unique_id>=21000001001755 (as a compared part of the reference)

The links in your comment seem to be broken?

In T155241#3156073, @Jopparn wrote:

The links in your comment seem to be broken?

Repaired

So to try and summarise this (@Alicia_Fagerving_WMSE please point out if any of this is not what we are actually doing):

What we are de-facto doing today is:

Image/commonscat claims:
- No source is added
All other claims
- Are sourced as imported from the monuments database
  - P854: url_to_monuments_api_for_that_entry
  - P577: date_the_database_was_last_updated
  - P248: Monuments database

The se-arbetsl dataset did not have any registrar_url and for se-fmis and se-bbr it coincides with the url generated by the id property.
Note however that for se-ship (and similar future datasets) we should have made sure to include registrar_url somehow, likely via an P973 claim.
Since the se-ship links were all broken at some point in the last month or so we should not import these now.

Lokal_Profil moved this task from 📆 This week to 📥 Backlog on the User-LokalProfil board.May 15 2017, 7:48 AM

In addition to the info in T155241#3234671:

When regristrar_url is identical to the link created by the id property it is left out. Otherwise it is used together with P937 as a qualifier on the heritage status property.

Restricted Application added a subscriber: jeblad. · View Herald TranscriptOct 6 2017, 7:51 AM

Lokal_Profil moved this task from 📥 Backlog to ☑️ Done on the User-LokalProfil board.Oct 6 2017, 7:51 AM

Decide how the WD items should be sourcedClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Decide how the WD items should be sourced
Closed, ResolvedPublic
Actions