While there's ongoing work to create a new architecture for dumps, we still need to support the current version. In concise terms, what happens now is that a set python scripts launch periodically some mediawiki maintenance scripts and the (very large) output of those is written to an NFS volume that is mounted from the dumpsdata host.
While I think it could be possible, with some extreme gymnastics, to make these run in kubernetes, I see little value, especially as it's a system we want to eventually sunset. In case this assumption of mine is wrong, then we might want to reconsider the plan I'm going to lie out in this task.
Specifically, I start from the assumption that we want two things, by migrating to kubernetes:
- Get rid of all the mediawiki-related puppet code, eventually
- Get rid of any non-k8s-related scap features, unify our deployments
I think this can be achieved as follows:
- We install the container runtime on the snapshot hosts
- As part of scap, we presync the mediawiki php image to all kubernetes nodes; also push the images to the snapshot hosts now
- We provide a wrapper script to run mediawiki maintenance scripts from a docker image mounting the right volumes
- We will run the needed sidecars (I think we only need mcrouter) as docker-images running on these hosts