Right now, our structure has several flaws, namely:
- extreme repetition
- no centralized place to define common values (say, the logstash hostname, or the version of envoy we want to install) which leads to horrors such as https://gerrit.wikimedia.org/r/q/topic:%22envoy_1.14.4%22+(status:open%20OR%20status:merged) whenever SREs need to change something across the board
- Mechanics of releasing are cumbersome (you have to change directory, source a file, run helmfile sync for *every* cluster you want to release to)
- Releasing has several nice footguns, including the fact that if you forget to source the correct env variables you risk applying the values of one cluster to another one without noticing
Our goal with a refactor is to:
- Remove the ambiguity between the cluster we're working on and what values we're applying: it should not be possible to apply codfw values to eqiad and vice-versa by mistake.
- Reduce repetition to the bare minimum that makes sense
- Allow easier management of SRE-driven changes/deployments
- Simplify the release process for users.