This task is to gather ideas and define how buildpack-based images will access NFS and to what extent.
== Problem
Buildpacks (specifically heroku upstream ones) expect the $HOME variable to be /app when running the built images.
Currently we are using that variable to point to the tool home directory in NFS, where the home of the unix user of the tool is.
This changes the way users have been using that NFS directory:
=== Current dependencies on $HOME
For the beta, the images have a $HOME hardcoded that point to the app code (/app), while non-buildpack images point to the NFS tool home directory.
Currently we set $HOME and workingDir (where the process will be running from)
Some pieces of code expect that the replica credentials will be under $HOME/replica.my.cnf, example:
* https://gitlab.wikimedia.org/toolforge-repos/python-toolforge/-/blob/main/src/toolforge/__init__.py#L65
* https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/blob/main/etc/LocalSettings.php#L57-67
Some code might expect to be able to log under $HOME
== Options
=== 1: Use a different env var to point to the tool home
We can create an environment variable that will point to whichever path the tool home is set to, something like `TOOL_HOME` or `TOOL_DATA` (more forward-compatible, and detaches the concept of unix user home from the tool data directory).
Pros:
* Easy to implement
* Easy to change in the future
* Enables sooner migrations
Cons:
* Does not work if you use linux user home retrieval methods (as it's not the home of the user)
* Potential future migration to whichever system we make (if it's mounted as a volume it might not need changes)
* Discord between buildpacks expected user (1000) and actually user running on the container might block some buildpacks usage or force custom workarounds (ex. scala/php)
=== 2: Disable NFS, only support secrets/envvars
Pros:
* Removes dependencies from outdated and hard-to-maintain technologies
* No future "remove NFS" migrations
* Lets us run the images with uid=1000 as they expect
Cons:
* Wait for the secrets/envvars service (not long)
* Increases barrier to migration
* Logs only from k8s containers
* No scratch storage (besides the container itself)
* No persistent storage
* Slower to cover current use cases
=== 3: Disable NFS, only support secrets/envvars and logs
Pros:
* Removes dependencies from outdated and hard-to-maintain technologies
* No future "remove NFS" migrations
* Lets us run the images with uid=1000 as they expect
Cons:
* Wait for the secrets/envvars service (not long)
* Wait for the logs service (long)
* Increases barrier to migration
* No scratch storage (besides the container itself)
* No persistent storage
* Slower to cover current use cases
=== 3: Disable NFS, only support secrets/envvars, logs and storage
Pros:
* Removes dependencies from outdated and hard-to-maintain technologies
* No future "remove NFS" migrations
* Lets us run the images with uid=1000 as they expect
Cons:
* Wait for the secrets/envvars service (not long)
* Wait for the logs service (long)
* Wait for the storage service (very long)
* Increases barrier to migration
* Slower to cover current use cases
=== Add others here
== See also
Related change: https://gerrit.wikimedia.org/r/c/cloud/toolforge/volume-admission-controller/+/901128