Page MenuHomePhabricator

MediaWiki normalized_message field has placeholders replaced since October 12th
Closed, ResolvedPublic

Description

There is a regression in the MediaWiki logged errors. The normalized_message use to have placeholders fields which, since October 12th, are now replaced by the values. When previously we had bunch of errors deduplicated via eg [{reqId}] {exception_url} PHP Warning: RedisException: Connection timed out each log now has a normalized_message with the the actual request ID and exception URL.

That was confusing me when I watched the logs last week as part of deploying the train ( T347081 ).

An example is searching for RedisException on the MediaWiki dashboard with a filter normalized_message: "[{reqId}] {exception_url} PHP Warning: RedisException: Connection timed out":

normalized_message_dedupe.png (336×695 px, 23 KB)

And since October 12th the deduplication is broken:

normalized_message_duplicates.png (389×1 px, 126 KB)

For messages matching RedisException

The last message with placeholders was at Oct 12, 2023 @ 13:29:07.992. The next one had placeholders replaced and occurred at Oct 12, 2023 @ 14:04:38.536. That is rather small time window.

Event Timeline

hashar triaged this task as Unbreak Now! priority.Oct 17 2023, 12:17 PM

Changing priority to Unbreak Now! cause that disrupt our triaging of MediaWiki logs since deduplication does not work anymore.

From operations/mediawiki-config, I guess that is related to 1.41.0-wmf.30 / T347081 which got promoted to group 2 wikis by c1da9c7584c52d7937c0fcf259fb36ef05bfe78b after Thu, 12 Oct 2023 13:57:10 +0000

commit c1da9c7584c52d7937c0fcf259fb36ef05bfe78b
Author: Antoine Musso <hashar@free.fr>
Date:   Thu Oct 12 13:56:25 2023 +0000

    group2 wikis to 1.41.0-wmf.30
    
    Bug: T347081
    Change-Id: Ia16d757e4b387d5ef0f48ca7cb52ee22c571ad6a

Notes (review):
    Code-Review+2: TrainBranchBot <mmodell+trainbranchbot@wikimedia.org>
    Verified+2: jenkins-bot
    Submitted-by: jenkins-bot
    Submitted-at: Thu, 12 Oct 2023 13:57:10 +0000
    Reviewed-on: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/965518
    Project: operations/mediawiki-config
    Branch: refs/heads/master

Not sure which of the 1.41.0-wmf.30 change is the culprit :)

Maybe due to https://gerrit.wikimedia.org/r/c/mediawiki/core/+/963661 by @Krinkle . The commit message states the MonoLog processors and handlers are inserted in a reversed order compared to before:

The same call overhead can also be removed for the processors and handlers array. Instead of calling push on each item, we can pass these to the constructor directly. This does mean the order will now be reversed but this is imho less surprising than before (i.e. call from first to last, instead of reversed), and to my knowledge none of our processors or handlers depend on side-effect from previous ones.

So if previously had a processor replacing the tokens then another one adding the normalized_message, they would now be processed in a reverse order: the normalized_message is added with placeholders then whatever other processor ends up replacing the tokens?

If I look at operations/mediawiki-config in wmf-config/logging.php there is a comment about the order:

T124985: The Processors listed in $wmgMonologProcessors are applied to a message in reverse list order (bottom to top).

And the same comment mentions normalized_message:

The normalized_message processor needs to be listed *after* the psr processor to work as expected.

That config has:

wmf-config/logging.php
$wmgMonologProcessors = [
    'wiki' => [
        'class' => \MediaWiki\Logger\Monolog\WikiProcessor::class,
    ],
    'psr' => [
        'class' => \Monolog\Processor\PsrLogMessageProcessor::class,
    ],
    'wmfconfig' => [
        'factory' => static function () {
        // code for normalized_message

So it is listed after but was previously processed first. As a result of the change to mediawiki/core the order is no natural and our normalized_message is processed last?

The normalized_message is forged using the`message` field which I guess had its placeholder processed by the wiki or psr processors that now run earlier.

Thus I think the fix is to reorder our $wmgMonologProcessors to use:

  1. wiki
  2. wmfconfig
  3. psr

And rewrite the comment to drop the "reverse list order" and mentions the wmfconfig should be listed before the psr processor.

Change 966529 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] logging: reorder wmgMonologProcessors entries

https://gerrit.wikimedia.org/r/966529

Change 966529 merged by jenkins-bot:

[operations/mediawiki-config@master] logging: reorder wmgMonologProcessors entries

https://gerrit.wikimedia.org/r/966529

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:38:34Z] <hashar@deploy2002> Started scap: Backport for [[gerrit:966529|logging: reorder wmgMonologProcessors entries (T349086)]]

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:39:54Z] <hashar@deploy2002> hashar: Backport for [[gerrit:966529|logging: reorder wmgMonologProcessors entries (T349086)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-10-17T18:46:48Z] <hashar@deploy2002> Finished scap: Backport for [[gerrit:966529|logging: reorder wmgMonologProcessors entries (T349086)]] (duration: 08m 14s)

hashar claimed this task.

I have deployed the change after @Krinkle had a look at it and in sync with @brennen who is running the MediaWiki train this week.

Looks like the normalized_message are good again, but anything that was emitted by 1.41.0-wmf.30 / 1.42.0-wmf.1 is "corrupted" until the fix got live on the servers. I don't think it is a big deal.