Specifically, alerts defined in [0] such as PHPFPMTooBusy aggregate by the k8s deployment label.
During turnup of the -next and -migration releases to support 8.1, we left this label (as before) equal to the namespace name [1], largely on the basis that these releases are part of the same logical service (and are readily differentiated by other means - e.g., the release label, the servergroup tag, etc.).
However, that means they're inappropriately aggregated together with other releases, which is not what we want.
Instead, we can either:
- Extend the mediawiki chart to permit overriding the label value in some sensible way, and then use it (e.g., set to mw-web-next in the -next deployment of mw-web). This has the downside that a number of other places would need updated where we use this label - e.g., service dashboards in grafana.
- Update the alert signal expressions in mw-on-k8s.yaml to also group by release. This has the side effect that, e.g., main and canary can alert independently (which now that I think about it, is probably something we want if indeed the canary is a canary).
- Something else??
I'm inclined to reach for #2, but wanted to get your take @jijiki before proceeding.