The cirrus-reindex-orchestrator is a tool that is able to run multiple reindex of wiki indices in parallel.
It is limited to 8 shards/cluster in parallel which means that a single reindex is happening on large wikis (commons) but could run up to 8 mwscript in parallel for small wikis.
Unfortunately the deployment of multiple mwscript-k8s is causing some impact on the k8s api response times:
We can see the timing degrading as big wikis get reindexed first and while more smaller wikis are getting processed concurrently the pressure on the k8s resources increases.
We could investigate ways to make this process less impactful on the k8s APIs:
- investigate using --local_dblist, it's possibly acceptable for small wikis?
- complete refactor and prefer using the mediawiki API to return the mapping/index config and schedule the reindex from pythons instead of the maint script
- workaround: review the concurrency limits and make the process slower overall
- possible small optimizations: the cleanup of helm deployments is not batched, perhaps it could help a bit to batch the cleanups (if helmfile destroy on muliple releases at once can help)
- other ideas?
AC:
- running a full reindex does not cause the k8s API response times to increase
