Page MenuHomePhabricator

[toolforge] Create a secrets management offering to avoid storing on NFS
Closed, ResolvedPublic

Description

This is a task to brainstorm ideas on how to evolve the current secrets management (plain text files on NFS) to a solution that will allow us more flexibility to change the implementation without the users needing to change any code, with that, then we can move it away from plain NFS and onto somewhere else.

Some starting ideas, focusing on the user story side, without going too deep into how to implement it:

Toolforge secrets, files and env vars

Create a toolforge cli for secrets (toolforge secret), that handles the secret management (probably as a cli-api combo)

As a first implementation, it could use k8s secrets on the tool namespace as backend.

That could read stdin, to allow things like "<mysecret.file toolforge secrets create --name='awesome_secret' --type=file --data=-" that creates the secret from an existing file.

There could be two types of secrets (following k8s secrets), files and env vars.

Common fields:

  • name
  • value

File-only fields:

  • mountpath: we could use a generated one instead, so it would not be needed. I would push towards having an associated env var, like SECRET_<name>_PATH, and users having to read the path from there in any case, that allows us (and them!) to not rely on any hardcodded path.

As user stories, that would be something like:

  • as a tool, I want to be able to create an environment variable with secret data, and use it on my tool
  • as a tool, I want to be able to create a file with secret data, and use it on my tool

Toolforge secrets, only env vars

Create a toolforge cli for secrets (toolforge secret), that handles the secret management.
Create an alias toolforge envvar (or the other way around, not really relevant which one is an alias to which).

This version will only support environment variables (following https://12factor.net).

As a first implementation, it could use k8s secrets on the tool namespace as backend.

Fields:

  • name
  • value

As user stories, that would be something like:

  • as a tool, I want to be able to create an environment variable with secret data, and use it on my tool
NOTE: If at any point we change the implementation for envvars and secret, we can just swap the alias with the new implementation.

Note

I'll move to a decision task at the end of the week.

Event Timeline

dcaro renamed this task from [toolforge] Re-invent the secrets management to avoid storing on NFS to [toolforge] Create a secrets management offering to avoid storing on NFS.Apr 12 2023, 12:00 PM
dcaro created this task.

Heroku's configuration implementation is 12-factor apps style environment variables. Their platform also includes non-public git storage that could be used for files which may simplify things for them. Their user exposed api mapped to our naming conventions might be something like:

$ toolforge envvar list
$ toolforge envvar get <name>
$ toolforge envvar set <name>=<value>
$ toolforge envvar unset <name>

Separating envvar storage from file storage may make implementation easier. I think it would also be likely to lead to less confusing CLI/API commands by keeping the related, but distinct actions separate.

Heroku's configuration implementation is 12-factor apps style environment variables. Their platform also includes non-public git storage that could be used for files which may simplify things for them. Their user exposed api mapped to our naming conventions might be something like:

$ toolforge envvar list
$ toolforge envvar get <name>
$ toolforge envvar set <name>=<value>
$ toolforge envvar unset <name>

Separating envvar storage from file storage may make implementation easier. I think it would also be likely to lead to less confusing CLI/API commands by keeping the related, but distinct actions separate.

Hmm, I would be confused to know if those would be secret or not, clearly distinguishing between secret and non-secret things makes my brain calmer when setting those things up, specially for files (my mind directly maps it to configmap/container.spec.env vs secret on k8s).

I would be ok with the above if we actually implemented the 12-factor apps style you mention, so config only comes through in env vars, so no files support, that sounds nice (bigger change for our users though).

Hmm, I would be confused to know if those would be secret or not, clearly distinguishing between secret and non-secret things makes my brain calmer when setting those things up, specially for files

I don't really see the use case for non-secret deploy time configuration directives. Public things can be put directly in the git repo. Things that cannot be put into the public git repo are by definition secrets in my view.

(my mind directly maps it to configmap/container.spec.env vs secret on k8s).

Worrying about a direct mapping to Kubernetes primitives sounds like a violation of "a solution that will allow us more flexibility to change the implementation without the users needing to change any code".

Worrying about a direct mapping to Kubernetes primitives sounds like a violation of "a solution that will allow us more flexibility to change the implementation without the users needing to change any code".

Completely agree, the point was that if something does not say "secret", I do not expect it to be secret. So instead on envvars, sercretvars would be a better, less scary name.

Explicit better than implicit, specially for potentially risky things.

While at kubecon I've started playing with automatic generation of code from swagger yaml files, and created a minimal shell for this service:

https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-secrets-api/-/tree/kubecon_between_sessions_push

That implements both the api and the cli boilerplate, with many out-of-the-box features like parameter validation for the API, or autocompletion for the cli.

Note that the implementation details are just a working stub, pending to be properly implemented when we decide what to do in this task, the interesting part was how easy was to get the boilerplate (spent way more time trying to get the deployment yamls to work xd).

@Slst2020 mentioned in chat the other day an idea that might solve those issues.

We can have both a 'toolforge envvar' and 'toolforge secret' and both just be an alias of the other, that prevents confusing the users if 'envvars' is secret or not.

I'll try to mock up the code for that.

For this though, we will need either a new admission controller or to modify webservice to inject the environment vars to the containers on pod creation (probably the controller might make more sense, as that will allow users using kubectl to get the same behavior, but that is a bit more complicated, as there's a new piece needed for the whole to work).

Just added it to the code: https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-secrets-api/-/tree/kubecon_between_sessions_push?ref_type=heads

There's now a script scrits/toolforge-envvar that will do that for you, changed a bit the subcommands to be more coherent (get instead of getSecret). The help strings still need updating though, but works good. I think it's a good option.

dcaro claimed this task.

Moved the discussion to a decision request here T335979: Decision request - Toolforge envvars/secrets service.

Feel free to add more info/options/etc. there