Page MenuHomePhabricator

Query constraint violations with WDQS
Closed, ResolvedPublic

Description

Now that the constraints are machine-readable on property pages, it is a charming idea to query constraint violations on the fly using the Wikidata Query Service. It should be possible to identify items based on various criteria, such as:

  1. French painters with any type of constraint violation on any claim
  2. French painters with a specific constraint violation (e.g. single values constraint) on any claim
  3. French painters with any type of constraint violation on a claim of a specific property (e.g. P569)
  4. French painters with a specific constraint violation (e.g. single values constraint) on a claim of a specific property (e.g. P569)
  5. And so on…

Right now we only have the possibility to work on all constraint violations of a given property (via covi pages), but it is not possible to limit this to sets of items in a given topical area (such as “French painters”).

It should also be possible to equip the constraint boxes on property talk pages with SPARQL links that way. Right now we have to wait for daily KrBot updates, which slows down maintenance occasionally.

The idea of this task is not new and has already been mentionend by several users, but I didn’t find a phabricator task yet. Still I hope it is not a duplicate.

Event Timeline

Restricted Application added a project: Discovery. · View Herald TranscriptAug 3 2017, 12:35 PM
Restricted Application added subscribers: PokestarFan, Aklapper. · View Herald Transcript
Esc3300 updated the task description. (Show Details)Aug 3 2017, 3:14 PM
Esc3300 added a subscriber: Esc3300.Aug 3 2017, 3:20 PM

I inserted numbering in the task description.

I think (4.) could be done easily with the SPARQL query links on property talk pages.

Maybe (2.) and (3.) could be done by some combination .. but it's likely to time out.

On the other hand, if you start from a list of items, you could try to do queries to check constraints from there. Many WikiProjects already have (Listeria) reports that do just that.

oh .. forget what I just wrote.

Maybe a new service for WQS would be easiest solution.

I thought of a new SERVICE as well, but I actually do not really have a preference how this should be implemented. Devs will find the best solution, I guess…

Smalyshev added a subscriber: Smalyshev.EditedAug 3 2017, 8:17 PM

If there is a Mediawiki API that checks constraints, that would be the best way to integrate - we just add it to supported APIs and use it. Note however MWAPI gate right now can not group queries, i.e. if it is used with results of other patters, it would do query per item. Thus, we have three ways to work on it:

  1. Have query "all french painters" and then check each for constraint violation (may be slow)
  2. Have API "all single-value constraint violations" and then apply "french painter" filter to it. Again, may be slow if there's a lot of constraint violations.
  3. Wait until I implement grouping for MWAPI service and have API that allows several IDs to be specified for constraint violation checks.

@Smalyshev I don’t think the lack of grouping support in MWAPI is a big problem here – constraint checks are just expensive at the moment, so I expect such queries would take a long time regardless of whether the actual API requests are grouped or not.

Yes, according to last weeks stats execution time of each individual(!) check can take up to 8 seconds.
I am afraid we can not run checks within a query.

The feature itself sounds very useful to me!
Maybe we can store the results of checks that were executed somewhere, so they could be queried afterwards.

Smalyshev triaged this task as Normal priority.

See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Constraints (currently enabled only for internal cluster, soon on public one too).

Smalyshev closed this task as Resolved.