Background
The l10n cache makes up the largest portion of the published multiversion MediaWiki image. If we can avoid building the cache during the image build and instead do it at deploy time, we can avoid a substantial amount of computational cost at build, registry storage requirements, and network i/o during k8s scheduling.
Proposal
To efficiently generate the cache at deploy time, we would need:
- A PersistentVolume defining local storage at a path on the node for the l10n caches that would be shared by all pods on each node. It would need to be large enough to keep 2-3 sets of caches, one for each version of MW deployed (which are 4+ Gb each but let's round way up to ~ 20 Gb total).
- A PersistentVolumeClaim defined in the chart or elsewhere that claims this local l10n PV for MW deployments. Note that if the PV is statically provisioned and defined ReadWriteMulti, only one PVC should be needed.
- An initContainer defined in the chart's pod template that mounts the local storage PV during scheduling and runs the rebuildLocalisationCache maintenance script for all wikis. An flock on the PV would be used to ensure only one pod is running the rebuild per node.
- The PV is mounted by the main container giving the MW runtime access to the l10n cache files (r/o if possible though I ran into strange issues having the same volume r/w for the init container and r/o for the main container).
This idea is a rip off of what had been experimented with previously by @dancy and @jeena. However, the previous dependency on a shared hostPath volume between pods on the same node raised some security concerns which I believe this approach avoids.
Proof of Concept
I've developed a small proof of concept around this idea using a local k8s cluster. Please see https://gist.github.com/marxarelli/3719068d447503800565dccda3154bb2 for implementation.
Need for feedback
This proposal needs serviceops feedback as it would rely on them to manage/provision the local storage PV.