[toolforge] Create a secrets management offering to avoid storing on NFS
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	dcaro
	Apr 12 2023, 12:00 PM

Description

This is a task to brainstorm ideas on how to evolve the current secrets management (plain text files on NFS) to a solution that will allow us more flexibility to change the implementation without the users needing to change any code, with that, then we can move it away from plain NFS and onto somewhere else.

Some starting ideas, focusing on the user story side, without going too deep into how to implement it:

Toolforge secrets, files and env vars

Create a toolforge cli for secrets (toolforge secret), that handles the secret management (probably as a cli-api combo)

As a first implementation, it could use k8s secrets on the tool namespace as backend.

That could read stdin, to allow things like "<mysecret.file toolforge secrets create --name='awesome_secret' --type=file --data=-" that creates the secret from an existing file.

There could be two types of secrets (following k8s secrets), files and env vars.

Common fields:

name
value

File-only fields:

mountpath: we could use a generated one instead, so it would not be needed. I would push towards having an associated env var, like SECRET_<name>_PATH, and users having to read the path from there in any case, that allows us (and them!) to not rely on any hardcodded path.

As user stories, that would be something like:

as a tool, I want to be able to create an environment variable with secret data, and use it on my tool
as a tool, I want to be able to create a file with secret data, and use it on my tool

Toolforge secrets, only env vars

Create a toolforge cli for secrets (toolforge secret), that handles the secret management.
Create an alias toolforge envvar (or the other way around, not really relevant which one is an alias to which).

This version will only support environment variables (following https://12factor.net).

As a first implementation, it could use k8s secrets on the tool namespace as backend.

Fields:

name
value

As user stories, that would be something like:

as a tool, I want to be able to create an environment variable with secret data, and use it on my tool

NOTE: If at any point we change the implementation for envvars and secret, we can just swap the alias with the new implementation.

Note

I'll move to a decision task at the end of the week.

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		LucasWerkmeister	T320140 Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes
Resolved		matmarex	T319707 Migrate dtcheck from Toolforge GridEngine to Toolforge Kubernetes
Resolved		Legoktm	T320062 Migrate steve-adder from Toolforge GridEngine to Toolforge Kubernetes
Resolved		Legoktm	T320011 Migrate rfa-voting-history from Toolforge GridEngine to Toolforge Kubernetes
Open		dcaro	T194332 [Epic,builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build packs
Resolved		dcaro	T267374 [tbs.beta] Create a toolforge build service beta release
Open	Feature	None	T339940 Support reading Wiki Replica/ToolsDB credentials from envvars
Resolved		dcaro	T265691 [tbs.maintaindbusers] Update maintain-dbusers to create an envvar with replica.my.cnf to be used by buildpack images
Resolved		dcaro	T334578 [toolforge] Create a secrets management offering to avoid storing on NFS
Resolved		dcaro	T337538 [toolforge] Create envvars management offering

Event Timeline

dcaro renamed this task from [toolforge] Re-invent the secrets management to avoid storing on NFS to [toolforge] Create a secrets management offering to avoid storing on NFS.Apr 12 2023, 12:00 PM

dcaro created this task.

fnegri subscribed.Apr 12 2023, 12:38 PM

Heroku's configuration implementation is 12-factor apps style environment variables. Their platform also includes non-public git storage that could be used for files which may simplify things for them. Their user exposed api mapped to our naming conventions might be something like:

$ toolforge envvar list
$ toolforge envvar get <name>
$ toolforge envvar set <name>=<value>
$ toolforge envvar unset <name>

Separating envvar storage from file storage may make implementation easier. I think it would also be likely to lead to less confusing CLI/API commands by keeping the related, but distinct actions separate.

In T334578#8777376, @bd808 wrote:
Heroku's configuration implementation is 12-factor apps style environment variables. Their platform also includes non-public git storage that could be used for files which may simplify things for them. Their user exposed api mapped to our naming conventions might be something like:
$ toolforge envvar list
$ toolforge envvar get <name>
$ toolforge envvar set <name>=<value>
$ toolforge envvar unset <name>
Separating envvar storage from file storage may make implementation easier. I think it would also be likely to lead to less confusing CLI/API commands by keeping the related, but distinct actions separate.

Hmm, I would be confused to know if those would be secret or not, clearly distinguishing between secret and non-secret things makes my brain calmer when setting those things up, specially for files (my mind directly maps it to configmap/container.spec.env vs secret on k8s).

I would be ok with the above if we actually implemented the 12-factor apps style you mention, so config only comes through in env vars, so no files support, that sounds nice (bigger change for our users though).

aborrero subscribed.Apr 13 2023, 11:08 AM

In T334578#8778547, @dcaro wrote:

Hmm, I would be confused to know if those would be secret or not, clearly distinguishing between secret and non-secret things makes my brain calmer when setting those things up, specially for files

I don't really see the use case for non-secret deploy time configuration directives. Public things can be put directly in the git repo. Things that cannot be put into the public git repo are by definition secrets in my view.

(my mind directly maps it to configmap/container.spec.env vs secret on k8s).

Worrying about a direct mapping to Kubernetes primitives sounds like a violation of "a solution that will allow us more flexibility to change the implementation without the users needing to change any code".

In T334578#8779742, @bd808 wrote:

Worrying about a direct mapping to Kubernetes primitives sounds like a violation of "a solution that will allow us more flexibility to change the implementation without the users needing to change any code".

Completely agree, the point was that if something does not say "secret", I do not expect it to be secret. So instead on envvars, sercretvars would be a better, less scary name.

Explicit better than implicit, specially for potentially risky things.

While at kubecon I've started playing with automatic generation of code from swagger yaml files, and created a minimal shell for this service:

https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-secrets-api/-/tree/kubecon_between_sessions_push

That implements both the api and the cli boilerplate, with many out-of-the-box features like parameter validation for the API, or autocompletion for the cli.

Note that the implementation details are just a working stub, pending to be properly implemented when we decide what to do in this task, the interesting part was how easy was to get the boilerplate (spent way more time trying to get the deployment yamls to work xd).

@Slst2020 mentioned in chat the other day an idea that might solve those issues.

We can have both a 'toolforge envvar' and 'toolforge secret' and both just be an alias of the other, that prevents confusing the users if 'envvars' is secret or not.

I'll try to mock up the code for that.

For this though, we will need either a new admission controller or to modify webservice to inject the environment vars to the containers on pod creation (probably the controller might make more sense, as that will allow users using kubectl to get the same behavior, but that is a bit more complicated, as there's a new piece needed for the whole to work).

Just added it to the code: https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-secrets-api/-/tree/kubecon_between_sessions_push?ref_type=heads

There's now a script scrits/toolforge-envvar that will do that for you, changed a bit the subcommands to be more coherent (get instead of getSecret). The help strings still need updating though, but works good. I think it's a good option.

dcaro updated the task description. (Show Details)May 3 2023, 8:57 AM