We added --dblist to mwscript-k8s so that you can invoke foreachwikiindblist within the container, sequentially operating on n wikis using 1 Kubernetes job. This is less resource-intensive than running mwscript-k8s in a for loop, which creates n Kubernetes jobs.
The limitation is that foreachwikiindblist, of course, can only read dblists from the container. Complex expressions like --dblist "s1 + s2" are supported, but if you want to prepare a truly custom list of wikis, there's no easy way to do it.
@Ladsgroup's use case involves constructing a list of wikis in a shell one-liner, like expanddblist s3 | grep -v foo | grep something_weird | xargs -I{} .... @Urbanecm_WMF uses a one-off set of wikis in a loose text file, not committed in the config repo. Presently mwscript-k8s doesn't support either workflow.
We could go a few ways with this:
- Modify readDbListFile() in WmfConfig.php to make it possible to read from /data, and upload dblists to the container with --file.
- Add a flag to mwscript-k8s to take a dblist file specifically (e.g. --local-dblist=./mywikis), and ConfigMap it into dblists/.
- Add a flag to mwscript-k8s to take a list of wikis (e.g. --wikis="foowiki barwiki bazwiki") and ConfigMap that into a file in dblists/.
Open to other ideas, and to preferences between these. I tend to think #3 is the best UX; the shell one-liner case would look something like --wikis="$(expanddblist s3 | munge | twiddle | frobnicate)". (The trouble with #2 is we already have a --dblist flag which can't be used for this, so it's confusion-prone -- for this to be really good, we'd need an even clearer flag name than --local-dblist, I think. I included #1 for completeness but I think both the implementation and the resulting UX are worse than either 2 or 3.)
However, @Scott_French points out with #3 we might need to be careful about shell tokenization. That's true; I think as long as it's just "make sure you quote the $()" then that's okay. (Failure to do so would end up as mwscript-k8s --wikis=foowiki barwiki bazwiki Script.php. That'd get you an error, but a harmless one, unless barwiki was also the name of a maintenance script(!).) I think that's not a dealbreaker, but I'm open to the perspective that it is, and we could proceed with something like --local-dblist <(expanddblist s3 | munge | twiddle | frobnicate) instead.
One downside of all three approaches is opacity: all we'd be able to tell from examining the Kubernetes object is the dblist's filename, not its contents. To tell what wikis it's operating on, we'd have to go to the logs (or, while the script is running, reach in and cat the dblist).
One more option, for posterity:
- Stop using foreachwikiindblist and build our own Kubernetes-native thing. That new thing could take either a dblist filename or a custom set of wikis, it could provide better transparency, and it could even add cool stuff like checkpointing (if we restart the container, start from the wiki we were working on instead of going back to aawiki). If we had the engineering time to totally modernize the maintenance infrastructure, this would be on my list, but to solve this problem alone it's the wrong investment.