We currently have 4 mediawiki clusters:
- api
- app
- jobrunners/videoscalers
- parsoid
Now that we are moving to k8s, there have been thoughts that maybe we could use a different configuration to service our needs.
Specifically:
- One cluster to serve live users. This means any wiki page (so /wiki/... or /w/index.php) or any api calls (/w/{rest,api}.php with a session token) NOT coming from a public cloud. This will be separated in -rw and -ro subcategories as it will be served by all datacenters
- One cluster to serve external API requests and any other request from the public clouds. This will also have both rw and ro endpoints
- One cluster to serve Calls from Toolforge. We might want to join this with the external api cluster at least at first, but I think it's good to keep the two groups separated.
- One cluster to serve internal requests. So say when a service needs to make an api call to prepare a response to a live client. This might be potentially be the same cluster as the one serving live users.
- One cluster to serve asynch processing, which would include MediaWiki jobs but also calls from other services that need to update their cached content, like restbase-async or the WDQS Updater or the upcoming Search Update pipeline
- One cluster (probably on baremetal at least at the start!) for running videoscaling.
- One cluster (of 2/4 pods) for mwdebug/testing
- One cluster (of 2/4 pods) for wikitech
The reason of the proposal is separation of concerns - we want to be able to privilege, under duress, the live users over anything else. Ideally we have a simple script that allows us to scale down everything else and give our full power to our live users. There is a special provision here for Toolforge because we know a lot of important tools are run there, which are fundamental for the good functioning of the wikis.