Page MenuHomePhabricator

[tbs] Define what NFS access to enable and how users will interact with it
Open, Needs TriagePublic

Description

This task is to gather ideas and define how buildpack-based images will access NFS and to what extent.

Problem

Buildpacks (specifically heroku upstream ones) expect the $HOME variable to be /app when running the built images.
Currently we are using that variable to point to the tool home directory in NFS, where the home of the unix user of the tool is.

This changes the way users have been using that NFS directory:

Current dependencies on $HOME

For the beta, the images have a $HOME hardcoded that point to the app code (/app), while non-buildpack images point to the NFS tool home directory.

Currently we set $HOME and workingDir (where the process will be running from)

Some pieces of code expect that the replica credentials will be under $HOME/replica.my.cnf, example:

Some code might expect to be able to log under $HOME

Options

1: Use a different env var to point to the tool home

We can create an environment variable that will point to whichever path the tool home is set to, something like TOOL_HOME or TOOL_DATA (more forward-compatible, and detaches the concept of unix user home from the tool data directory).

Pros:

  • Easy to implement
  • Easy to change in the future
  • Enables sooner migrations

Cons:

  • Does not work if you use linux user home retrieval methods (as it's not the home of the user)
  • Potential future migration to whichever system we make (if it's mounted as a volume it might not need changes)
  • Discord between buildpacks expected user (1000) and actually user running on the container might block some buildpacks usage or force custom workarounds (ex. scala/php)
  • Like every time we introduce a breaking change, tools will be lost because nobody will update them.

2: Disable NFS, only support secrets/envvars

Pros:

  • Removes dependencies from outdated and hard-to-maintain technologies
  • No future "remove NFS" migrations
  • Lets us run the images with uid=1000 as they expect

Cons:

  • Wait for the secrets/envvars service (not long)
  • Increases barrier to migration
  • Logs only from k8s containers (not persistent)
  • No scratch storage (besides the container itself)
  • No persistent storage
  • Slower to cover current use cases
  • Like every time we introduce a breaking change, tools will be lost because nobody will update them.

3: Disable NFS, only support secrets/envvars and logs

Pros:

  • Removes dependencies from outdated and hard-to-maintain technologies
  • No future "remove NFS" migrations
  • Lets us run the images with uid=1000 as they expect

Cons:

  • Wait for the secrets/envvars service (not long)
  • Wait for the logs service (long)
  • Increases barrier to migration
  • No scratch storage (besides the container itself)
  • No persistent storage
  • Slower to cover current use cases
  • Like every time we introduce a breaking change, tools will be lost because nobody will update them.

4: Disable NFS, only support secrets/envvars, logs and storage

Pros:

  • Removes dependencies from outdated and hard-to-maintain technologies
  • No future "remove NFS" migrations
  • Lets us run the images with uid=1000 as they expect

Cons:

  • Wait for the secrets/envvars service (not long)
  • Wait for the logs service (long)
  • Wait for the storage service (very long)
  • Increases barrier to migration
  • Slower to cover current use cases
  • Like every time we introduce a breaking change, tools will be lost because nobody will update them.

5: Customize the buildpack setup to don't rely on a particular $HOME value

Pros:

  • No more changes required from the community.

Cons:

  • This requires us to build and maintain our own buildpacks to not use $HOME
  • This requires us to build and maintain our own buildpacks to use a different user (not 1000, and dynamic, so maybe not doable)
  • This requires us to build and maintain our own buildpacks for every lang we want to support
  • This requires us to build and maintain our own run, build and builder images
  • No upstream community
  • No interoperability with any other public or private cloud (vendor lock-in with us)
  • future-proof? TBD.

Add others here

See also

Related change: https://gerrit.wikimedia.org/r/c/cloud/toolforge/volume-admission-controller/+/901128

Event Timeline

Change 919271 had a related patch set uploaded (by David Caro; author: David Caro):

[cloud/toolforge/volume-admission-controller@main] admission: refactor in preparation for adding a new var

https://gerrit.wikimedia.org/r/919271

Change 919272 had a related patch set uploaded (by David Caro; author: David Caro):

[cloud/toolforge/volume-admission-controller@main] admission: Add TOOL_DATA_DIR env var

https://gerrit.wikimedia.org/r/919272

Sent a couple patches to add the extra environment variable, for now that will help people using buildservice, and easy to maintain going forward

Change 919271 merged by jenkins-bot:

[cloud/toolforge/volume-admission-controller@main] admission: refactor in preparation for adding a new var

https://gerrit.wikimedia.org/r/919271

Change 919272 merged by jenkins-bot:

[cloud/toolforge/volume-admission-controller@main] admission: Add TOOL_DATA_DIR env var

https://gerrit.wikimedia.org/r/919272

Change 919800 had a related patch set uploaded (by David Caro; author: David Caro):

[cloud/toolforge/volume-admission-controller@main] toolsbeta: doploy latest image

https://gerrit.wikimedia.org/r/919800

Change 919800 merged by jenkins-bot:

[cloud/toolforge/volume-admission-controller@main] toolsbeta: doploy latest image

https://gerrit.wikimedia.org/r/919800

Change 919832 had a related patch set uploaded (by David Caro; author: David Caro):

[cloud/toolforge/volume-admission-controller@main] tools: deploy the latest image

https://gerrit.wikimedia.org/r/919832

Change 920259 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/software/tools-webservice@master] Add an option to disable NFS access

https://gerrit.wikimedia.org/r/920259

dcaro updated the task description. (Show Details)

Change 919832 merged by jenkins-bot:

[cloud/toolforge/volume-admission-controller@main] tools: deploy the latest image

https://gerrit.wikimedia.org/r/919832