Problem statement:
- Messages aren't in one place in Logstash, making feature-based dashboards harder to build.
- Shared dashboards used by deployers and SRE are harder to audit when the channel named after the component is missing messages.
- Quick summaries are more meaningful when messages aren't spread across multiple channels for the same component. E.g. a spike can more easily be identified this way, especially over long periods of time.
- Less mistakes or ambiguity in adding CA code, or in wmf-config.
Currently we have:
- CentralAuth - most messages
- CentralAuthVerbose - "on each page view" debug information containing personal names.
- CentralAuthRename - rename subcomponent diagnostics.
- suppressjob - Job queue job, CentralAuthSuppressUserJob.
- CentralAuthSULRename - CLI script, forceRenameUsers.php.
And configured as:
- CentralAuth - all severity levels enabled in prod, including "debug" and "info".
- CentralAuthVerbose - disabled in prod, enabled in beta cluster.
- CentralAuthRename - fully enabled in prod, including "debug" and "info".
- suppressjob - disabled.
- CentralAuthSULRename - disabled.
I propose the following:
- Rename CentralAuthRename to CentralAuth, any messages that should remain enabled by default in prod, elevate from to "info" level or above.
- Audit CentralAuth messages and elevate anything to "info" or above that we want enabled in prod.
- Rename CentralAuthVerbose to CentralAuth using debug level.
- Rename suppressjob and CentralAuthSULRename to CentralAuth using debug level.
- Simplify beta config as CentralAuth: debug and prod config as CentralAuth: info.