This RFC is a result of T204024 and specifically the request for an RFC in T204024#4891344
**### Vocabulary**
* WBQC: WikibaseQualityConstraints mediawiki extension, deployed on wikidata.org.
* WDQS: The wWikidata qQuery sService, https://query.wikidata.org.
**### Current situation**
WBQC runs checks on wWikidata entities on demand from users.
Check resultsResults of these constraint checks are stored in memcached with a default ttlTTL of 84600 seconds (1 day).
WBQC checks are accessible via 3 methods:
- RDF action https://www.wikidata.org/wiki/Q123?action=constraintsrdf
- Specialpage https://www.wikidata.org/wiki/Special:ConstraintReport/Q123
- API https://www.wikidata.org/w/api.php?action=wbcheckconstraints&&id=Q123
The special page and API can be used by users directly; the API is also called whenever a logged-in user visits an entity page, to display the results on the entity page.
Executions of the API will result in constraint checks being run if stored data is out of date or not stored / evicted for the entity.
Executions of the special page currently always re run the constraint checks, do not load from the cache and do not store to the cache.
The RDF action is for the WDQS and will not trigger a constraint check run, it can only be used for retrieving the RDF representation of currently stored constraint checks.
When retrieved from the cache the WBQC extension has logic built in to determine if the stored result needs to be updated (because something in the dependency graph has changed).
We are in the process of rolling out a Job that will run constraint checks for an entity post edit rather than on only on demand by a user.Executions of the API will result in constraint checks being run if stored data is out of date, T204031or cache is absent/expired for the entity.
Once constraint checks are stored more persistently we will be able to expose an event queue of the generation of the checks for ingestion into WDQS T201147.
Loading /reloading of data into WDQS will also present the need to dump all constraint checksExecutions of the special page currently always re-run the constraint checks, do not get or set via the cache.
5644 out of 5767 properties on wikidata currently have constraints that need to be checked.
Roughly 1.85 million items do not have statements (currently),The RDF page-action exists for use by the WDQS and will not run the constraint check itself, it only exposes an RDF description of the currently stored constraints that apply to this entity.
When retrieved from the cache, the WBQC extension has logic built-in to determine if the stored result needs to be updated (because something in the dependency graph has changed).
We are in the process of rolling out a JobQueue job that will re-run constraint checks for an entity post-edit, rather than on only on-demand by a user. leaving 52.05 million items that do have statements and need to have constraint checks runT204031
Once constraint checks are stored more persistently we will be able to expose an event queue of the generation of the checks for ingestion into the WDQS, T201147.
Loading /re-loading of data into the WDQS will also present the need to dump all constraint checks.
5,644 out of 5,767 properties on Wikidata currently have constraints that require a (cacheable) check execution.
Roughly 1.85 million items do not have any statements (currently), leaving 52 million items that do have statements and need to have constraint checks run.
Constraint checks also run on Properties and Lexemes but the number there is negligible when compared with Items.
Constraint checks on an item can take a wide variety of times to execute based on the constraints used. Full constraint checks are logged if they take longer than 5 seconds (INFO) or 55 seconds (WARNING) and the performance of all constraint checks is monitored on grafana.
Some full constraint checks reach the current interactive PHP time limit while being generated for special pages or the API.
**Problem statement**
Primary problem statement:
- Constraint check results need to be loaded into WDQS, but there is nowe don't currently a full sehave the result of all constraints check results for all entities on wikidataWikidata items stored anywhere.
Secondary problem statements:
- Generating constraint reports when the user requests them leads to a bad user experience as they must wait for a prolonged amount of time.
- Users can flood the API generating constraint checks for entities putting unnecessary load on app servers.
**Solution proposal**
- Rather than defaulting to running constraint checks upon a users request primarily pre generate constraint check results post edit using the job queue. T204031
- Rather that storing constraint check results in memcached, store them in a more permanent storage solution.
- When new constraint check results are stored, fire and event for the WDQS to listen to so that it can load the new constraint check data
- Dump constraint check data from the persistent storage to allow for dumping to file and loading into WDQS.
- Use the same logic that currently exists to determine if the stored constraint check data needs updating when retrieve.
- Alterations to the special page to load from the cache? Provide the timestamp of when the checks were run? Provide a way to manually purge the checks and re run (get the latest results) with a button from the page.
Note: Even when constraint checks are run after all entity edits, the data persistently stored will slowly become out of date (therefore also the data stored by WDQS). The issue of 1 edit needing to trigger constraint checks on multiple entities is considered a separate issue and is not in the scope of this RFC.