Problem
I am running a test (T261009) on mwdebug1001 that has severely degraded performance, but sadly this has triggered alerts. This has happened in the past of course, where I am trying to test something, and we get alerts from debug servers. So the question is, does it make sense to not have mwdebug* servers contribute to latency metrics, to error metrics and fire alerts of that sort?
Proposal
Create a separate "debug" mwdiawiki cluster, which will have its own dashboards where engineers can look when testing changes there. In other words, remove mwdebug* hosts from the "appserver" cluster, and create a new one for them. The goal is to decouple mwdebug* metrics and alerts from production.
Pros:
- Better visibility when testing
- Errors will not alert
- metrics will not contribute to production metrics
Cons:
- Scap needs to adapt to this, so when one is testing on mwdebug*, the overall error rate will not prevent scap from moving forward a deployment Release-Engineering-Team
Actionables:
- implement X-Analytics: debug=1 on the analytics part
- implement X-Analytics: debug=1 on vcl
Task will continue on T276994 since we are migrating to k8s