Page MenuHomePhabricator

Reduce potential load caused by ImportConstraintStatements.php maintenance script
Closed, ResolvedPublic5 Estimated Story Points

Description

The ImportConstraintStatements.php maintenance script in WikibaseQualityConstraints currently imports constraint statements for all properties in a simple loop, with no offset, limit, batching, or sleeping. This can potentially lead to a high DB load, and high replication lag.

Example:
We could group the imports into batches (with some default batch size, e. g. 100) and sleep between each batch (default e. g. 3 s). Batch size and sleep time should be configurable via command line arguments.

Some offset/limit arguments could also be added, so that a deployer can first import constraint statements for the first 1000 properties, then proceed later with the next 1000, and so on.

Acceptance criteria:

  • The batch size of the maintenance script is configurable
  • The script includes a wait for replication between batches

Original Comment:

As a deployer, I want to run the extensions/WikibaseQualityConstraints/maintenance/ImportConstraintStatements.php maintenance script when necessary in order to resolve problems with the constraint definitions, without having to worry about its impact on the wiki.

This only takes a few minutes – less than 10 ms for most properties, not much more for some larger properties, and we’re still well below 10000 properties – but it was enough to cause a visible traffic spike on the s8 master db host during the deployment for T223372 (around 13:17 UTC). As Wikidata keeps getting more properties, I believe this will gradually become more important.

Event Timeline

Addshore moved this task from incoming to needs discussion or investigation on the Wikidata board.
Addshore subscribed.

Is batch+sleep the best approach? Do we need offset+limit?

So, looking that the maint script and job code that is actually run batching already occurs.
Firstly the maint script runs a job for each property synchronously.
Within those jobs some batching, or kind of batching seems to be happening, but is not configurable and doesn't look perfect.
https://github.com/wikimedia/mediawiki-extensions-WikibaseQualityConstraints/blob/26846fd7320993f13570392232b6e358a3378a4f/src/UpdateConstraintsTableJob.php#L131-L134
Although batch size is 10, more than 10 things could actually end up being in a single batch in theory.

The most important thing here is probably to wait for replication lag somewhere which it doesn't look like the script or the job does.
It probably makes sense to do this just after the inserts, eg https://github.com/wikimedia/mediawiki-extensions-WikibaseQualityConstraints/blob/26846fd7320993f13570392232b6e358a3378a4f/src/UpdateConstraintsTableJob.php#L132

In terms of arbitrary sleeps that a runner of the maint script could pass in, we could do this, but DB wise there is probably no need.

> What should the default batch size and sleep time be?

I answer the sleep question above, it can just be a wait for replication.

As for batch size, it looks like calling insertBatch only results in a single insert which is nice.
And the current default in the job means right now we are working with mostly batches of 10.
https://github.com/wikimedia/mediawiki/blob/master/maintenance/Maintenance.php#L1709 defines the default batch size for maintenance scripts currently at 200, it's probably fine to use that default and pass it down to the job.

Addshore moved this task from Incoming to Needs Work on the Wikidata-Campsite board.

@Lucas_Werkmeister_WMDE I'll leave this in the Needs Work column for you to read and possibly amend the description :)
Then I think it is ready for the "Ready to go" col, mainly for the sleep for replication, which we should add asap.
Marking as Low prio in terms of things that we are picking up on the campsite though.

Although batch size is 10, more than 10 things could actually end up being in a single batch in theory.

I’m not sure why that batch size is there (it’s already present in the first patch set of the change introducing the class). Perhaps we can just remove it? Most properties have less than 10 constraint statements, and none have more than 50 (summary), so your suggested size of 200 would effectively disable this level of batching anyways. I think it makes more sense to batch at the maintenance script level than in the job.

The most important thing here is probably to wait for replication lag somewhere which it doesn't look like the script or the job does.
It probably makes sense to do this just after the inserts, eg https://github.com/wikimedia/mediawiki-extensions-WikibaseQualityConstraints/blob/26846fd7320993f13570392232b6e358a3378a4f/src/UpdateConstraintsTableJob.php#L132

Does it make sense to wait for replication in a job, especially when most of the time the wait won’t be followed by another insert? I feel like that would just pointlessly block the job runner. (If the job runner wants to wait for replication before proceeding with the next job, that should be its own responsibility.)

Having some sort of limit there for batch sizes is probably a good idea, we would want to stop users being able to impact the size of the inserts too dramatically. (I wonder how big a batch a malicious actor could create). But perhaps 10 there is too small, just switching that to 200 would probably be fine :)

Where the wait for replay should be is an interesting question.
I assume these jobs are not only run in the maintenance script then?

If these jobs are run else where, and we switch to a batch size of 200, then we should probably also add a wait for lag in the case when multiple batches would happen.
We would likely also then want a wait for lag in the maint script between each execution of the job from there.

The job also runs every time the constraint statements of a property are edited. (And to create a big batch you’d have to have a property with x constraint statements.)

So, there is no problem with having waitForReplication in jobs, but it should probably only be done in the case of batching, which is unlikely or us in production, but theoretically possible.

So IMO the job should run the first batch, and then if it needs to run another batch, it should waitForReplication first.
Right now the batch size isn't configurable, but in an ideal it would be for 3rd party users to be able to tweak for their situations (and that's when the batching and waitForReplication would most likely actually come into play).

Then for the maint script, after each job is ->run() is should probably have a waitForReplication.

ItamarWMDE renamed this task from Make WikibaseQualityConstraints’ ImportConstraintStatements.php maintenance script more database-friendly to Reduce potential load caused by ImportConstraintStatements.php maintenance script.Jan 19 2021, 1:27 PM
ItamarWMDE updated the task description. (Show Details)
ItamarWMDE updated the task description. (Show Details)
darthmon_wmde set the point value for this task to 5.

For a similar example of introducing batching, see the difference between patch sets 2 and 3 of this change for T270247.

Change 661747 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Batch properties in ImportConstraintStatements

https://gerrit.wikimedia.org/r/661747

Change 661748 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Use transaction in UpdateConstraintsTableJob

https://gerrit.wikimedia.org/r/661748

Change 661747 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Batch properties in ImportConstraintStatements

https://gerrit.wikimedia.org/r/661747

Change 661748 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Use transaction in UpdateConstraintsTableJob

https://gerrit.wikimedia.org/r/661748