Page MenuHomePhabricator

investigate options for regularly running constraint checks
Closed, ResolvedPublic


Things to potentially look into:

  • run checks for item that has been edited (possibly excluding bot edits)
  • measure the average duration of the constraint check run on a single item
  • have a special job queue to run constraint checks after edits
  • results of constraint check are only stored in the cache - they can be evicted from the cache if there is too many results to be stored
  • links table (possible way to figure out which items are affected by the edit)
  • evaluate constraint definition for checking strategy (e.g. items that only have "local checks", i.e. checks that only affect the "current" item)

Going through the list options, including creating actionable tasks is going to be limited to 4 hours.

Event Timeline

WMDE-leszek created this task.
WMDE-leszek moved this task from Incoming to Ready to estimate on the Wikidata-Campsite board.

Looking at current edit and constraint checking rates it seems that it might be feasible to check constraint on every edit. We will go forward with another staged roll out queue constraint checking jobs with n percent increasing.
Job should deduplicate and the running condition between the gadget and the job should be considered.
Add more grafana trackings to get run time of checks per item (min, max, avg)

Second step would be storing the results in a dedicated table instead of the cache. We will also store meta data that allow us to decide when to recheck an item (related items, type of checks, last run).

Second step will also fix the problem with reimporting the data to after a Wikidata-Query-Service node goes down.

I guess this is done as we had the meeting and wrote the stuff down.
I'm going to convert our thoughts and comments into other tickets today, so will leave it on peer review for now & self assign.