Page MenuHomePhabricator

Move mediawiki to new logging infrastructure
Closed, ResolvedPublic

Description

As part of T205849: Begin the implementation of Q1's Logging Infrastructure design (2018-19 Q2 Goal) and specifically T205851: Migrate >=90% of existing Logstash traffic to the logging pipeline logstash traffic is moving from being sent over the network in a multitude of formats to being sent to local logging facilities (i.e. rsyslog and/or journald) using a single format, namely syslog + JSON payload, IOW what MediaWiki sends logstash now over plaintext UDP.

When re-using local logging facilities the message stream contains messages from other applications as well, so we'll have to explicitly tag JSON payloads to have a robust detection mechanism (i.e. prepending the syslog message with @cee: the CEE cookie).

In terms of limitations the default rsyslog maximum message size is 8k, though it can be bumped. Today's limit is the udp MTU i.e. 1472 bytes.

To ease the transition the idea is to provide a local udp syslog endpoint: we already know how mediawiki behaves when writing logging to udp though the idea is to eventually move to unix socket /dev/log as it is standard syslog.

Additional context/information: https://wikitech.wikimedia.org/wiki/Logstash/SRE_onboard

Event Timeline

fgiunchedi triaged this task as Normal priority.Dec 4 2018, 4:12 PM
fgiunchedi created this task.
fgiunchedi added a comment.EditedDec 4 2018, 4:18 PM

I've looked briefly at how to implement prefixing syslog json messages with @cee: and I'd say we could do it on the "syslog side" i.e. ./includes/debug/logger/monolog/SyslogHandler.php or "logstash side" i.e. ./includes/debug/logger/monolog/LogstashFormatter.php. I don't have strong opinions on either really!

fgiunchedi updated the task description. (Show Details)Dec 4 2018, 4:30 PM
Paladox added a subscriber: Paladox.Dec 4 2018, 4:49 PM

I've looked briefly at how to implement prefixing syslog json messages with @cee: and I'd say we could do it on the "syslog side" i.e. ./includes/debug/logger/monolog/SyslogHandler.php or "logstash side" i.e. ./includes/debug/logger/monolog/LogstashFormatter.php. I don't have strong opinions or either really!

If I am understanding the upstream documentation correctly, the desired result in the on-the-wire UDP packet payload is something like: <PRI>DATETIME HOSTNAME PROGRAM: @cee: {...json here..}. The "@cee: " string in the middle here is the new part that needs to be added.

I concur with @fgiunchedi that either the handler or the formatter could be changed to implement this. In either case I would suggest doing it via a new class rather than adding a feature flag to the existing classes that are used in the WMF production configuration. To me it makes a bit more sense to implement the change in the formatter because this change is really about the message payload encoded in the syslog UDP packet rather than a change to the basic syslog packet itself.

If I was going to undertake this work (which I'm not volunteering for due to other existing commitments), I would probably introduce a new MediaWiki\Logger\Monolog\CeeFormatter class on the MediaWiki side that extends MediaWiki\Logger\Monolog\LogstashFormatter. This class would override the public function format(array $record) method of the parent class something like this:

public function format( array $record ) {
    return "@cee: " . parent::format( $record );
}

Once this change was available on all wikis configuration could be changed to use this formatter instead of the current MediaWiki\Logger\Monolog\LogstashFormatter and the handler output directed towards the rsyslog collector (i.e. deploy in week 0, config change in week 1).

There are certainly other ways to solve this--and some that may be quicker to get into production, like hacking the formatter directly into mediawiki-config.git's wmf-config/logging.php file--but I think this would be the more long term stable and supportable route.

fgiunchedi updated the task description. (Show Details)Dec 5 2018, 11:02 AM

Change 477791 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[mediawiki/core@master] WIP: add CeeFormatter

https://gerrit.wikimedia.org/r/477791

I've looked briefly at how to implement prefixing syslog json messages with @cee: and I'd say we could do it on the "syslog side" i.e. ./includes/debug/logger/monolog/SyslogHandler.php or "logstash side" i.e. ./includes/debug/logger/monolog/LogstashFormatter.php. I don't have strong opinions or either really!

If I am understanding the upstream documentation correctly, the desired result in the on-the-wire UDP packet payload is something like: <PRI>DATETIME HOSTNAME PROGRAM: @cee: {...json here..}. The "@cee: " string in the middle here is the new part that needs to be added.
I concur with @fgiunchedi that either the handler or the formatter could be changed to implement this. In either case I would suggest doing it via a new class rather than adding a feature flag to the existing classes that are used in the WMF production configuration. To me it makes a bit more sense to implement the change in the formatter because this change is really about the message payload encoded in the syslog UDP packet rather than a change to the basic syslog packet itself.
If I was going to undertake this work (which I'm not volunteering for due to other existing commitments), I would probably introduce a new MediaWiki\Logger\Monolog\CeeFormatter class on the MediaWiki side that extends MediaWiki\Logger\Monolog\LogstashFormatter. This class would override the public function format(array $record) method of the parent class something like this:

public function format( array $record ) {
    return "@cee: " . parent::format( $record );
}

Once this change was available on all wikis configuration could be changed to use this formatter instead of the current MediaWiki\Logger\Monolog\LogstashFormatter and the handler output directed towards the rsyslog collector (i.e. deploy in week 0, config change in week 1).
There are certainly other ways to solve this--and some that may be quicker to get into production, like hacking the formatter directly into mediawiki-config.git's wmf-config/logging.php file--but I think this would be the more long term stable and supportable route.

Thanks @bd808 for the context/insight, I agree having the change in core is the right path. I took a stab at the patch, will need some guidance for sure on the mw core production deployment part.

Change 477791 merged by jenkins-bot:
[mediawiki/core@master] monolog: add CeeFormatter

https://gerrit.wikimedia.org/r/477791

Thanks @bd808 for the context/insight, I agree having the change in core is the right path. I took a stab at the patch, will need some guidance for sure on the mw core production deployment part.

The CeeFormatter patch is merged now, so the class should be live on the beta clusters wikis and will get pushed out to the prod wikis with the next train (week of December 10th).

I would suggest that the next step be to figure out how to configure the beta cluster to start logging using this new monologrsyslog(whatever)logstash pipeline first to shake out the kinks. Both prod and the beta cluster share the same wmf-config/logging.php script to setup the Monolog classes. One way that you could vary things per-environment is to introduce new $wmg.... variables in wmf-config/InitialiseSettings.php & wmf-config/InitialiseSettings-labs.php that would let you switch things based on wiki and then use the new variable(s) in logging.php to swap out the PHP class used by the existing 'logstash' formatter and change the target of the generated 'logstash-<level>' handlers as needed.

Change 478621 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/mediawiki-config@master] logging: special-case CeeFormatter when sending logstash to localhost

https://gerrit.wikimedia.org/r/478621

Thanks @bd808 for the context/insight, I agree having the change in core is the right path. I took a stab at the patch, will need some guidance for sure on the mw core production deployment part.

The CeeFormatter patch is merged now, so the class should be live on the beta clusters wikis and will get pushed out to the prod wikis with the next train (week of December 10th).

Awesome!

I would suggest that the next step be to figure out how to configure the beta cluster to start logging using this new monologrsyslog(whatever)logstash pipeline first to shake out the kinks. Both prod and the beta cluster share the same wmf-config/logging.php script to setup the Monolog classes. One way that you could vary things per-environment is to introduce new $wmg.... variables in wmf-config/InitialiseSettings.php & wmf-config/InitialiseSettings-labs.php that would let you switch things based on wiki and then use the new variable(s) in logging.php to swap out the PHP class used by the existing 'logstash' formatter and change the target of the generated 'logstash-<level>' handlers as needed.

I tried https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/478621 before seeing your comment, which sort of achieves the same but varying on whether we're sending to localhost. I like more your approach of using a variable though so I'll change the patch to do that instead.

Change 478621 merged by Filippo Giunchedi:
[operations/mediawiki-config@master] logging: introduce cee formatter usage

https://gerrit.wikimedia.org/r/478621

Mediawiki in beta is now logging on localhost using ceeformatter, mediawiki-related dashboards on https://logstash-beta.wmflabs.org look good to my untrained eye. Next up is getting someone else more familiar with mw logging to confirm and drafting patches for gradual production rollout.

Change 480052 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/mediawiki-config@master] Use logging pipeline for group0

https://gerrit.wikimedia.org/r/480052

Change 480057 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] role: add rsyslog::udp_localhost_compat to mediawiki hosts

https://gerrit.wikimedia.org/r/480057

Change 480057 merged by Filippo Giunchedi:
[operations/puppet@production] role: add rsyslog::udp_localhost_compat to mediawiki hosts

https://gerrit.wikimedia.org/r/480057

Change 480434 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] role: include kafka_shipper on mediawiki hosts

https://gerrit.wikimedia.org/r/480434

Change 480434 merged by Filippo Giunchedi:
[operations/puppet@production] role: include kafka_shipper on mediawiki hosts

https://gerrit.wikimedia.org/r/480434

Change 480052 merged by Filippo Giunchedi:
[operations/mediawiki-config@master] Use logging pipeline for group0

https://gerrit.wikimedia.org/r/480052

Change 480473 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/mediawiki-config@master] Use array for wmgLogstashServers

https://gerrit.wikimedia.org/r/480473

Change 480473 merged by jenkins-bot:
[operations/mediawiki-config@master] Use array and IP address for wmgLogstashServers

https://gerrit.wikimedia.org/r/480473

MW logging through the logging pipeline is deployed to group0 (testwiki / mediawikiwiki), results: https://logstash.wikimedia.org/goto/9ab84d79bb82da19288a0a1ee9f3b125 doing about 1.5 messages/s for input input/kafka/rsyslog-udp-localhost

jijiki added a subscriber: jijiki.Dec 18 2018, 2:13 PM

Change 481825 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/mediawiki-config@master] Onboard group1 to new logging infrastructure

https://gerrit.wikimedia.org/r/481825

Change 481825 merged by Filippo Giunchedi:
[operations/mediawiki-config@master] Onboard group1 to new logging infrastructure

https://gerrit.wikimedia.org/r/481825

Mentioned in SAL (#wikimedia-operations) [2019-01-07T10:31:07Z] <filippo@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Move group1 to new logging infrastructure - T211124 (duration: 00m 45s)

Change 483384 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/mediawiki-config@master] Default production logging to new logging infrastructure

https://gerrit.wikimedia.org/r/483384

Change 483384 merged by jenkins-bot:
[operations/mediawiki-config@master] Default production logging to new logging infrastructure

https://gerrit.wikimedia.org/r/483384

Mentioned in SAL (#wikimedia-operations) [2019-01-16T08:42:12Z] <filippo@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Default to new logging infrastructure - T211124 (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2019-01-16T09:00:32Z] <godog> test roll-restart rsyslog on mw hosts in eqiad - T211124

fgiunchedi closed this task as Resolved.Jan 16 2019, 10:06 AM
fgiunchedi claimed this task.

This is completed!