Page MenuHomePhabricator

[envvars] scope to jobs/components
Open, MediumPublicFeature

Description

Feature summary (what you would like to be able to do and where):

Currently any envvars is accessible to all pods running within the tool account.

When multiple components are running to provide a service, with their own configuration, there is both a risk of conflicting env vars (how do you specify 2 different databases for 2 different components, without having to start encoding deployment information in code e.g. their names) and a risk of leaking secrets (given a component serving public traffic, talking to a backend which uses secrets to interact with e.g. enwiki, a compromise of the frontend component should not expose the secrets for the backend component).

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Any tool which executes more than 1 job/component within a tool account.

There is no a 'supported' way to do inter-tool traffic outside of routing it through the webproxy, which has limited support (jobs/components) and experiences regular service degredation.

It is technically possible to hit tools directly (using their service names), but this has challenges due to the services being deleted/re-created on deploys (causing NXDOMAIN caching) and is "not supported" as it relies on the cluster dns/communication which "may one day become something else".

In the same vane of reuse_from in components-api and how this would be done with label matching in kubernetes, there does need to be some flexibility.

Take for example cluebotng-review - there are 13 scheduled jobs (components) which share the same configuration as the main continuous job and helper continuous job. Then there are 2 jobs that need no secrets, but 1 does need runtime config (T405018). Then there are 3 jobs which all have their own secrets & runtime config.

I would expect to have 5 "groups" of envvars, which are loaded into their relevant runtimes above.

It would also be useful to allow cross tool secrets, there are many use cases for this and it is an extension of component limitation/selection:

  • Provide secrets to other tools consuming your services e.g. API keys (this is done by hand for managed things like elastic search today)
  • Share common access keys/settings across groups of tools - for example alloy configuration hitting a common monitoring tool

Benefits (why should this be implemented?):

  • Improved security, less access to secrets
  • Improved maintainability, no requirement to work around conflicting names
  • Improved clarity, no wondering if the entry is legacy, no side effects of changing for 1 component impacting another component later (e.g. after a restart)
  • Options for components-api/jobs-api to validate environment variables exist, providing quicker and meaningful feedback to users rather than pods failing to start

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Today when removing an envvars used by grafana-alloy 2 unrelated containers got stuck in CreateContainerConfigError because the admission policy set their environment to point to something that no longer exists

Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Failed     41m                   kubelet  Error: secret "toolforge.envvar.v1.alloy-scrape-targets" not found

Both jobs had to be restarted to clear out the environment variable and bring things back online - the envvar in question is not used of required by these jobs.