User story:
As a Wikidata editor,
I want to have a simple way of reporting errors in external id values to the respective data sources
in order to fix these errors upstream in an efficient way.
As a curator of an external data source,
I want to get error reports from Wikidata in a structured way
in order to deal with them more systematically and efficiently.
Problem:
- The status quo for Wikidata editors is very time-consuming and often unproductive:
- language difficulties between user and institution
- each institution has a different processes
- there is often no answer to such reports in practice
- The process is often intransparent:
- it might help if everyone could see the number of solved and unsolved errors per external data source (e.g. to avoid duplicate reports, or as an incentive)
Solution:
The primary goal should be a structured way of reporting errors for Wikidata editors.
Ideas:
Maybe the simplest way to solve this (maybe even without the need for coding):
- 'errors' wiki page (similar to e.g. https://de.wikipedia.org/wiki/Wikipedia:GND/Fehlermeldung)
- Property with a link to an existing 'errors' wiki page
- Property would be helpful even if only to track which properties have these
Some more elaborate features of a potential new tool:
- managing the databases should be allowed for both Wikidata editors and interested institutions (on which one or more external IDs on Wikidata are based)
- institution should be able to solve reports to them (each employee of the institution can log in and solve issues)
- reports should be possible through the tool interface
- and ideally also directly from a Wikidata item through an apposite gadget (e.g. button for each external ID value "add mistake report")
- constraint violations are automatically added
- the automatic system should also include statements deprecated with qualifier P2241: Q29998666 (reason: error in the referenced source)
- possible improvement: give the institution some possibility to solve reports also semiautomatically
- maybe we can integrate this to Mismatch Finder (or at least use some code and infrastructure from it)
- Mismatch finder also has external errors to report back (if something war reported on by mistake)
- ideally, we could also report systematic errors that exist in a group of values of the external ID (e.g. in the form of a message)
The basic tables could look similar to Mismatch finder:
- Property
- External ID
- WD Item
- Status (can be changed by the institution)
- Comment by the reporter (why do we think this is wrong; automatic or manual)
Example table from Mismatch Finder
Notes:
- institutions often ask us to have a tool to monitor the changes done on "their" data (data that they entered, or data about their collections, or "their" external ID...), do you see this as part of the tool as well?
- this can be done already, e.g. https://www.wikidata.org/w/index.php?title=Special:RecentChangesLinked&target=Property:P9164&namespace=0&showlinkedto=1
- and there might be a tool that is combining RC and a query from Magnus, but it was broken (?)
Mockups:
Acceptance criteria:
Open questions:
- Do we need a tool, or could we solve this with a wikipage + properties system?
- What workflow would work well for curators of external data sources?
- Should old errors remain on Wikidata (with suitable deprecation reason), after they have been changed/fixed in referenced source?
- maybe they should, especially if the reference has an access date (the tool would need to be aware of this)
Community communication:
Who we needs to keep in the loop and in what way:
Who this could be interesting for and in what way:
Original:
Created in a working session at the Wikidata Data Quality Days 2022.