Page MenuHomePhabricator

Add provenance information (who edited which statements) to Wikidata
Open, MediumPublic

Description

Right now there is easy way to look up

  • by whom and when a given statement was last introduced (=which edit to an item added it)
  • whether some statement or statement pattern (e.g. use of a property) had been in the version history of an item

To answer these question one has to crawl the version history and compare revisions. The provenance information on Wikidata statements should best be available by SPARQL and via the user interface.

A first (but not final) step could be extension of the RDF data model as following:

?stm a wikibase:Statement .
?stm prov:generatedAtTime ?time .
?stm schema:creator ?user .

And later extend with modeling edit events as RDF resources (similar to https://www.mediawiki.org/wiki/Extension:SemanticHistory):

?stm prov:qualifiedGeneration ?edit.
  ?edit schema:creator ?user .
  ?edit prov:atTime ?time .

I'd suggest to at least agree on which RDF property to use to connect a stement with the time of it's creation (e.g. prov:generatedAtTime) and with a wiki account (e.g. schema:creator but what's the URI of a given account?), so the data can be generated independently from version history unless it's included in Wikibase.

I don't know yet how to handle edits on qualifiers and references. Adding or modifying a reference could make an editor a schema:contributor of the referenced statement but adding/removing a qualifier is a more contribution. Do URIs of statement nodes in RDF stay the same if qualifiers are changed?

Event Timeline

nichtich created this task.Jun 2 2016, 8:09 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 2 2016, 8:09 AM
Lydia_Pintscher triaged this task as Lowest priority.Apr 20 2017, 2:02 PM
Akuckartz added a subscriber: Akuckartz.
Akuckartz claimed this task.Aug 8 2020, 4:53 PM
Akuckartz raised the priority of this task from Lowest to Medium.

I am trying to work on this, but can not promise much. @nichtich Are you still interrested in this issue?

T206560#6387330 (RDF* / SPARQL*) may be relevant.