Page MenuHomePhabricator

Expose constraint violations to WDQS using event queue
Open, MediumPublic13 Estimated Story Points

Description

As a user I would like to see all constraint violations in WDQS, so I can fix them.

GIVEN a constraint check is executed
WHEN a constraint violation is detected
THEN I should be able to query it with SPARQL.

GIVEN a constraint check is executed
WHEN no constraint violations is detected
THEN I constraint violations should no longer show up when queried with SPARQL

NOTE: At the moment constraints violations are only imported to WDQS if they are cached the moment WDQS pulls the rdfs for constraint violations for an item. There is a race condition between the WDQS poller and the constraints check execution and this is why only a fraction of constraint violations are imported.

When T189458: re-enable wdqs kafka poller is working we can create a dedicated event for when a constraint check is finished, so WDQS poller always pulls cached violations.

Tasks:

  • Create new event in the kafka data pipeline when a constraint check is finished. The event would include the entity ID that was being checked.
  • Listen to event in WDQS and pull the data for that specific item.

Relevant files:
WDQ:
https://github.com/wikimedia/wikidata-query-rdf/blob/728a6cf8665a6535585af3d5805aeda07f3517f6/tools/src/main/java/org/wikidata/query/rdf/tool/change/KafkaPoller.java
Constraints extension:
https://github.com/wikimedia/mediawiki-extensions-WikibaseQualityConstraints

Related Objects

Event Timeline

Jonas triaged this task as Medium priority.Aug 3 2018, 9:08 AM
Jonas created this task.
Jonas removed the point value for this task.Aug 3 2018, 9:12 AM
WMDE-leszek set the point value for this task to 13.Aug 21 2018, 1:17 PM

@Smalyshev I guess when reloading a query service instance with all of the data you won't really be able to use the api or event buss data to do that?
You'll need a dump?
If so, that should be possible with T204024: Store WikibaseQualityConstraint check data in persistent storage instead of in the cache

Addshore changed the task status from Open to Stalled.Jun 27 2019, 12:24 AM

We could move forward with work in this area now and expose the checks that we do run on edit and cache in a queue for wdqs to pick up.
But it is probably worth waiting for both T204031, T204024 and T214362 to be done, so setting to stalled for now.

Addshore changed the task status from Stalled to Open.Aug 11 2021, 12:47 PM

This is no longer stalled on having the constraint checks run, as they run after every edit.
However they are still not yet persistently stored.
But un-stalling nonetheless

Looks promising and important -- usage and quality of constraints will certainly increase once this is resolved.