This task is to gather ideas and define how buildpack-based images will access NFS and to what extent.
Problem
Buildpacks (specifically heroku upstream ones) expect the $HOME variable to be /app when running the built images.
Currently we are using that variable to point to the tool home directory in NFS, where the home of the unix user of the tool is.
This changes the way users have been using that NFS directory:
Current dependencies on $HOME
For the beta, the images have a $HOME hardcoded that point to the app code (/app), while non-buildpack images point to the NFS tool home directory.
Currently we set $HOME and workingDir (where the process will be running from)
Some pieces of code expect that the replica credentials will be under $HOME/replica.my.cnf, example:
- https://gitlab.wikimedia.org/toolforge-repos/python-toolforge/-/blob/main/src/toolforge/__init__.py#L65
- https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/blob/main/etc/LocalSettings.php#L57-67
Some code might expect to be able to log under $HOME
Options
1: Use a different env var to point to the tool home
We can create an environment variable that will point to whichever path the tool home is set to, something like TOOL_HOME or TOOL_DATA (more forward-compatible, and detaches the concept of unix user home from the tool data directory).
Pros:
- Easy to implement
- Easy to change in the future
- Enables sooner migrations
Cons:
- Does not work if you use linux user home retrieval methods (as it's not the home of the user)
- Potential future migration to whichever system we make (if it's mounted as a volume it might not need changes)
- Discord between buildpacks expected user (1000) and actually user running on the container might block some buildpacks usage or force custom workarounds (ex. scala/php)
- Like every time we introduce a breaking change, tools will be lost because nobody will update them.
2: Disable NFS, only support secrets/envvars
Pros:
- Removes dependencies from outdated and hard-to-maintain technologies
- No future "remove NFS" migrations
- Lets us run the images with uid=1000 as they expect
Cons:
- Wait for the secrets/envvars service (not long)
- Increases barrier to migration
- Logs only from k8s containers (not persistent)
- No scratch storage (besides the container itself)
- No persistent storage
- Slower to cover current use cases
- Like every time we introduce a breaking change, tools will be lost because nobody will update them.
3: Disable NFS, only support secrets/envvars and logs
Pros:
- Removes dependencies from outdated and hard-to-maintain technologies
- No future "remove NFS" migrations
- Lets us run the images with uid=1000 as they expect
Cons:
- Wait for the secrets/envvars service (not long)
- Wait for the logs service (long)
- Increases barrier to migration
- No scratch storage (besides the container itself)
- No persistent storage
- Slower to cover current use cases
- Like every time we introduce a breaking change, tools will be lost because nobody will update them.
4: Disable NFS, only support secrets/envvars, logs and storage
Pros:
- Removes dependencies from outdated and hard-to-maintain technologies
- No future "remove NFS" migrations
- Lets us run the images with uid=1000 as they expect
Cons:
- Wait for the secrets/envvars service (not long)
- Wait for the logs service (long)
- Wait for the storage service (very long)
- Increases barrier to migration
- Slower to cover current use cases
- Like every time we introduce a breaking change, tools will be lost because nobody will update them.
5: Customize the buildpack setup to don't rely on a particular $HOME value
Pros:
- No more changes required from the community.
Cons:
- This requires us to build and maintain our own buildpacks to not use $HOME
- This requires us to build and maintain our own buildpacks to use a different user (not 1000, and dynamic, so maybe not doable)
- This requires us to build and maintain our own buildpacks for every lang we want to support
- This requires us to build and maintain our own run, build and builder images
- No upstream community
- No interoperability with any other public or private cloud (vendor lock-in with us)
- future-proof? TBD.
Add others here
See also
Related change: https://gerrit.wikimedia.org/r/c/cloud/toolforge/volume-admission-controller/+/901128