Page MenuHomePhabricator

Unify CentralAuth log channels
Closed, ResolvedPublic

Description

Problem statement:

  • Messages aren't in one place in Logstash, making feature-based dashboards harder to build.
  • Shared dashboards used by deployers and SRE are harder to audit when the channel named after the component is missing messages.
  • Quick summaries are more meaningful when messages aren't spread across multiple channels for the same component. E.g. a spike can more easily be identified this way, especially over long periods of time.
  • Less mistakes or ambiguity in adding CA code, or in wmf-config.

Currently we have:

  • CentralAuth - most messages
  • CentralAuthVerbose - "on each page view" debug information containing personal names.
  • CentralAuthRename - rename subcomponent diagnostics.
  • suppressjob - Job queue job, CentralAuthSuppressUserJob.
  • CentralAuthSULRename - CLI script, forceRenameUsers.php.

And configured as:

  • CentralAuth - all severity levels enabled in prod, including "debug" and "info".
  • CentralAuthVerbose - disabled in prod, enabled in beta cluster.
  • CentralAuthRename - fully enabled in prod, including "debug" and "info".
  • suppressjob - disabled.
  • CentralAuthSULRename - disabled.

I propose the following:

  • Rename CentralAuthRename to CentralAuth, any messages that should remain enabled by default in prod, elevate from to "info" level or above.
  • Audit CentralAuth messages and elevate anything to "info" or above that we want enabled in prod.
  • Rename CentralAuthVerbose to CentralAuth using debug level.
  • Rename suppressjob and CentralAuthSULRename to CentralAuth using debug level.
  • Simplify beta config as CentralAuth: debug and prod config as CentralAuth: info.

Event Timeline

Change 812477 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/CentralAuth@master] Adopt PSR-4 logger, use levels, and unify channels

https://gerrit.wikimedia.org/r/812477

Change 812478 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] Limit "CentralAuth" log channel to level=info and above

https://gerrit.wikimedia.org/r/812478

Change 812479 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] Remove unused 'CentralAuthRename' log config

https://gerrit.wikimedia.org/r/812479

Change 812477 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Adopt PSR-4 logger, use levels, and unify channels

https://gerrit.wikimedia.org/r/812477

Change 812478 merged by jenkins-bot:

[operations/mediawiki-config@master] Limit "CentralAuth" log channel to level=info and above

https://gerrit.wikimedia.org/r/812478

Change 812479 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove unused 'CentralAuthRename' log config

https://gerrit.wikimedia.org/r/812479

Krinkle triaged this task as Medium priority.Sep 6 2022, 5:03 PM
Krinkle updated the task description. (Show Details)
Krinkle added a project: Performance-Team.
Krinkle updated the task description. (Show Details)
Tgr subscribed.

CentralAuthRename was very helpful in searching for rename errors, something common enough that it has its own wikitech page. There are maybe a few hundred rename-related messages a month, and about a hundred million authentication related ones, so this makes finding the right messages somewhat hard.

I can see the value in strictly following extension names, but we should restore the differentiation somehow.

Change 994877 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@master] GlobalRename: Mark debug log entries

https://gerrit.wikimedia.org/r/994877

Change 994877 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] GlobalRename: Mark debug log entries

https://gerrit.wikimedia.org/r/994877