Page MenuHomePhabricator

Expose constraint violations to WDQS using event queue
Open, Stalled, MediumPublic13 Story Points

Description

As a user I would like to see all constraint violations in WDQS, so I can fix them.

GIVEN a constraint check is executed
WHEN a constraint violation is detected
THEN I should be able to query it with SPARQL.

GIVEN a constraint check is executed
WHEN no constraint violations is detected
THEN I constraint violations should no longer show up when queried with SPARQL

NOTE: At the moment constraints violations are only imported to WDQS if they are cached the moment WDQS pulls the rdfs for constraint violations for an item. There is a race condition between the WDQS poller and the constraints check execution and this is why only a fraction of constraint violations are imported.

When T189458: re-enable wdqs kafka poller is working we can create a dedicated event for when a constraint check is finished, so WDQS poller always pulls cached violations.

Tasks:

  • Create new event in the kafka data pipeline when a constraint check is finished. The event would include the entity ID that was being checked.
  • Listen to event in WDQS and pull the data for that specific item.

Relevant files:
WDQ:
https://github.com/wikimedia/wikidata-query-rdf/blob/728a6cf8665a6535585af3d5805aeda07f3517f6/tools/src/main/java/org/wikidata/query/rdf/tool/change/KafkaPoller.java
Constraints extension:
https://github.com/wikimedia/mediawiki-extensions-WikibaseQualityConstraints

Event Timeline

Jonas triaged this task as Medium priority.Aug 3 2018, 9:08 AM
Jonas created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2018, 9:08 AM
Jonas removed the point value for this task.Aug 3 2018, 9:12 AM
Salgo60 added a subscriber: Salgo60.Aug 5 2018, 6:26 AM
Addshore updated the task description. (Show Details)Aug 21 2018, 12:17 PM
Addshore updated the task description. (Show Details)Aug 21 2018, 12:44 PM
WMDE-leszek updated the task description. (Show Details)Aug 21 2018, 1:15 PM
WMDE-leszek updated the task description. (Show Details)
WMDE-leszek set the point value for this task to 13.Aug 21 2018, 1:17 PM
Jonas updated the task description. (Show Details)Aug 31 2018, 9:46 AM

@Smalyshev I guess when reloading a query service instance with all of the data you won't really be able to use the api or event buss data to do that?
You'll need a dump?
If so, that should be possible with T204024: Store WikibaseQualityConstraint check data in persistent storage instead of in the cache

Addshore changed the task status from Open to Stalled.Jun 27 2019, 12:24 AM

We could move forward with work in this area now and expose the checks that we do run on edit and cache in a queue for wdqs to pick up.
But it is probably worth waiting for both T204031, T204024 and T214362 to be done, so setting to stalled for now.