Page MenuHomePhabricator

Add possibility to check constraints on unsaved statements
Open, Needs TriagePublic5 Story Points

Description

There are at least two third-party implementations of constraint checks – HarvestTemplates (code) and OpenRefine (code) – which check constraints on some data before adding it to Wikidata. It would be nice if we supported this in WikibaseQualityConstraints, so that third parties don’t have to re-implement everything.

Technically, this means adding support for passing JSON snippets (snaks, full statements, or full entities?) instead of entity or statement IDs into wbcheckentities. (I think that makes more sense than a separate API module.)

@Pintoch perhaps we can work on this during Wikimedia-Hackathon-2018 – I’ll add support to WikibaseQualityConstraints and you try to use it in OpenRefine? :)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 8 2018, 6:37 PM
Agabi10 added a subscriber: Agabi10.May 8 2018, 9:32 PM
Lucas_Werkmeister_WMDE set the point value for this task to 5.May 15 2018, 1:31 PM

As soon as this is supported by the Wikibase API, then it makes sense to build support for this directly in Wikidata-Toolkit. This is something that would be massively useful for many people.

As for OpenRefine, we need to brainstorm a bit to find when exactly these calls should be made. Currently, constraint validation is run on the entire batch every time a change is made. There could be two sorts of constraints checks: the ones we run in real time because they are cheap, and the ones we perform just before the edits are made… but then they would need to be reported in a different way?

So it's not just a backend issue, we also need to understand how this will be presented to the user.

Other random thoughts:

  • for some constraints (such as inverse constraints) we need to look at the entire edit batch, not just an individual statement change, to detect if a violation will be triggered (so caching stuff is tricky)
  • we can probably make some assumptions about the batches (for instance, assume that an edit batch will not make any chances to the P279 type hierarchy of the types involved in its constraints)
  • I'm not familiar with the testing infrastructure for Wikidata. On OpenRefine, everything is pretty much hardcoded to use wikidata.org and not test.wikidata.org. Moving to another instance would require cleaning that up (and this must be done for the openrefine-wikidata reconciliation service first…) On the long term, OpenRefine should just have a Wikibase extension and be agnostic to the instance, but we are quite far from that.
Vvjjkkii renamed this task from Add possibility to check constraints on unsaved statements to 1cdaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed the point value for this task.
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 1cdaaaaaaa to Add possibility to check constraints on unsaved statements.Jul 1 2018, 3:17 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot set the point value for this task to 5.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Jc86035 added a subscriber: Jc86035.
abian added a subscriber: abian.Oct 15 2019, 5:16 PM