The WDQS updater have several config options to reduce the concurrency at which it calls the MW api.
The config option wikibase_repo_thread_pool_size controls the size of the thread pool running HTTP requests.
During a test of this application with zookeeper and the flink-k8s-operator we had to backfill around 2weeks of updates and this caused a massive load on the mw-api-int cluster.
We lowered this value from 30 to 5 expecting to see a 1/6 fold reduction but this reduction was nowhere near what was expected, we barely saw the impact suggesting that the current limits are already too high and the system is limited by the endpoint capacity not by itself.
Looking a the code this limits is imposed on the HTTP thread pool that is attached to a job task, given that we run at a parallelism of 12 this means that the actual number of concurrent requests is parallelism * wikibase_repo_thread_pool_size.
So we went from 30*12=360 to 5*12=60.
We should definitely change how this is configured to take the flink parallelism into account.
AC:
- the updater should have a single option to control the MW requests concurrency
- we should probably not run the AsyncOp over all the 12 tasks