Maniphest T205856

Retire udp2log: onboard its producers and consumers to the logging pipeline
Open, MediumPublic
Actions

Assigned To

None

Authored By

	fgiunchedi
	Oct 1 2018, 1:07 PM

Description

We are deprecating udp2log in production, thus the current users should be migrated to the logging pipeline instead.

List of candidates for migration:

mediawiki
scap

The scap is easy (low volume) whereas for mediawiki we'll have to do some thinking because volume is significant (~18-20k msg/s) and udp2log output is consumed by multiple people on mwlog hosts, and the move should be transparent to them (i.e. change transport to be kafka but still write to files)

Related Objects
Search...

Status	Assigned	Task
Resolved	herron	T227080 Deprecate all non-Kafka logstash inputs
Open	None	T205856 Retire udp2log: onboard its producers and consumers to the logging pipeline
Resolved	fgiunchedi	T86969 Send scap log directly to logstash via syslog input
Open	None	T215497 Move iegreview from udp2log to syslog
Resolved	fgiunchedi	T215499 Move wikimania-scholarships from udp2log to syslog
Open	None	T126989 MediaWiki logging & encryption

Event Timeline

fgiunchedi created this task.Oct 1 2018, 1:07 PM

fgiunchedi added a subtask: T86969: Send scap log directly to logstash via syslog input.

fgiunchedi updated the task description. (Show Details)Oct 1 2018, 1:10 PM

herron triaged this task as Medium priority.Oct 2 2018, 5:24 PM

fgiunchedi moved this task from Backlog to In Dev/Progress on the Wikimedia-Logstash board.Oct 15 2018, 2:25 PM

fgiunchedi moved this task from In Dev/Progress to Up next on the Wikimedia-Logstash board.

fgiunchedi added a parent task: T213157: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6).Jan 16 2019, 11:15 AM

fgiunchedi removed a parent task: T205849: Begin the implementation of Q1's Logging Infrastructure design (2018-19 Q2 Goal).

fgiunchedi renamed this task from Deprecate >= 50% of udp2log producers to Retire udp2log: onboard its producers and consumers to the logging pipeline.Jan 16 2019, 11:22 AM

fgiunchedi updated the task description. (Show Details)

fgiunchedi mentioned this in T117821: Make a udp2log output plugin for Logstash.Jan 16 2019, 11:30 AM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 10:27 PM

fgiunchedi mentioned this in T126989: MediaWiki logging & encryption.Feb 7 2019, 1:35 PM

fgiunchedi mentioned this in T215611: MediaWiki errors overloading logstash.Feb 15 2019, 10:03 AM

This is the outline of the plan to move mediawiki logging off udp2log and logging pipeline's kafka (cc @bd808 @aaron @Ottomata)

Transport

A new rsyslog localhost udp endpoint is introduced on mediawiki hosts (e.g. named mwlog) that takes udp syslog and forwards it onto a set of kafka topics (separate from topics consumed by logstash). Messages are then consumed by mwlog hosts, via kafkatee or rsyslog, and written to per-channel files to keep compatibility with what we have now.
Using a separate endpoint and set of topics allows for some tuning / flexibility around the fact that udp2log stream is quite a bit bigger at the moment than the logstash stream: thus kafka topic retention, rate limits, etc for example would need to be different between the existing logstash topics.

Formatting

Log entry formatting is changed from "line" to syslog + json by MediaWiki before emitting to localhost, json is kept as the kafka message format as well. Upon being consumed by mwlog the json is then formatted as "lines" using the existing formatting found on mwlog hosts.

Open questions

The plan has syslog + json as formatting when transporting on kafka, since that's what we use for logstash already and preserves more information. Although we could have syslog + current formatting written to kafka?
The idea would be to name this "mwlog" (e.g configuration, topic names, etc) to steer away from "udp2log", seems sensible?

Implementation

The following patches are meant to implement the plan above, specifically:

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/498106 patch to mediawiki-core to introduce a new handler. The need from a new handler stems from having to vary syslog "application name" for each record to insert the channel name, thus a formatter alone isn't sufficient.

https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/494254 wiring the handler above into logging.php, initially disabled

https://gerrit.wikimedia.org/r/c/operations/puppet/+/498386 kafkatee consumer setup and demux into plaintext files

CDanis subscribed.Feb 15 2019, 4:18 PM

The plan has syslog + json as formatting, since that's what we use for logstash already and preserves more information. Although we could have syslog + current formatting?

People (including @tstarling?) have advocated in the past for keeping the logs on mwlog1001 in the more human readable format. It might be a reasonable compromise to build a shell pipeline compatible utility that can be used to reformat JSON log event records for doing things like grep foo some.log | humanlog or tail -f some.log | humanlog. A utility like that could also pretty easily do simple extraction of particular elements of the json structure which might be easier to use than some of the typical awk magic that folks use when data mining from the files. I can dig up some python scripts I have written in the past to kickstart something like this.

It might be a reasonable compromise to build a shell pipeline compatible utility that can be used to reformat JSON log event records

kafkatee can do this, and was in fact built for it:

https://github.com/wikimedia/analytics-kafkatee/blob/master/kafkatee.conf.example#L129-L207

Krenair subscribed.Feb 19 2019, 3:25 PM

Qs:

Are the logs sent using Monolog?

Is there just one topic 'mwlog', or multiple, one per channel? I'm asking just in case we should consider using this rsyslog feature to log to Kafka via Monolog, rather than our effort in T216163: Add monolog adapters for Eventbus.

In T205856#4959426, @bd808 wrote:

The plan has syslog + json as formatting, since that's what we use for logstash already and preserves more information. Although we could have syslog + current formatting?

People (including @tstarling?) have advocated in the past for keeping the logs on mwlog1001 in the more human readable format. It might be a reasonable compromise to build a shell pipeline compatible utility that can be used to reformat JSON log event records for doing things like grep foo some.log | humanlog or tail -f some.log | humanlog. A utility like that could also pretty easily do simple extraction of particular elements of the json structure which might be easier to use than some of the typical awk magic that folks use when data mining from the files. I can dig up some python scripts I have written in the past to kickstart something like this.

Thanks for the feedback!

I have edited my comment to clarify that the scope of json vs current formatting was limited to writing to kafka, at least in the first phase we're focusing on making the transport more reliable and secure but keep the format as-in on mwlog.

We would indeed need something to extract the right fields when reading from kafka and writing to files to keep the same human formatting as now so definitely we'd appreciate any help/tools towards that!

re: the open question itself I'm leaning towards having json on kafka, for multiple reasons: it makes kafka messages uniform (mw logstash logging is already json) and thus easier for consumers to have only one format, and we won't lose information/context compared to what mw knows about the log message.

In T205856#4965359, @Ottomata wrote:

Qs:

Are the logs sent using Monolog?

Is there just one topic 'mwlog', or multiple, one per channel? I'm asking just in case we should consider using this rsyslog feature to log to Kafka via Monolog, rather than our effort in T216163: Add monolog adapters for Eventbus.

There will be one topic per syslog severity, similarly to what's happening now for mediawiki logstash logging, we have considered doing one topic per channel but ultimately decided against it due to potential topic flooding, and it'd be a little fragile to reconstruct the channel name from the topic name (e.g. choose a suitable separator/prefix that can't be in a channel name)

re: the open question itself I'm leaning towards having json on kafka

Yes please!

There will be one topic per syslog severity [...]

Ok great. We are working on making Monolog be able to send events via EventBus extension, which will ultimately log to Kafka. The Monolog channels we are sending are really 'request logs' (which is a kind of event), but since it was logging I was wondering if we should consider using your rsyslog stuff to get this data to Kafka instead. If we don't have easily have control over topics, then we can rule out this option. Thanks!

In T205856#4968542, @Ottomata wrote:

re: the open question itself I'm leaning towards having json on kafka

Yes please!

There will be one topic per syslog severity [...]

Ok great. We are working on making Monolog be able to send events via EventBus extension, which will ultimately log to Kafka. The Monolog channels we are sending are really 'request logs' (which is a kind of event), but since it was logging I was wondering if we should consider using your rsyslog stuff to get this data to Kafka instead. If we don't have easily have control over topics, then we can rule out this option. Thanks!

ack, thanks for additional context!

fgiunchedi added a subtask: T126989: MediaWiki logging & encryption.Mar 4 2019, 12:49 PM

also T152104

• fdans moved this task from Incoming to Radar on the Analytics board.May 13 2019, 3:27 PM

fgiunchedi edited parent tasks, added: T227080: Deprecate all non-Kafka logstash inputs; removed: T213157: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6).Jul 2 2019, 12:49 PM

fgiunchedi moved this task from Inbox to Up next on the observability board.Aug 19 2019, 2:58 PM

fgiunchedi moved this task from Up next to Inbox on the observability board.Nov 25 2019, 1:51 PM

fgiunchedi moved this task from Inbox to Backlog on the observability board.Apr 6 2020, 12:36 PM

In T126989#5076715, @gerritbot wrote:

Change 498106 merged by Filippo Giunchedi:
[mediawiki/core@master] monolog: add MwlogHandler

https://gerrit.wikimedia.org/r/498106

One year later, this class appears not to be used anywhere. Is it expected to become used or have plans changed?

Krinkle added a project: Performance-Team (Radar).Apr 20 2020, 5:34 PM

In T205856#6072710, @Krinkle wrote:

In T126989#5076715, @gerritbot wrote:

Change 498106 merged by Filippo Giunchedi:
[mediawiki/core@master] monolog: add MwlogHandler

https://gerrit.wikimedia.org/r/498106

One year later, this class appears not to be used anywhere. Is it expected to become used or have plans changed?

The former, plan is still to move away from udp2log and onto Kafka / logging pipeline!

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.May 21 2020, 1:34 AM

Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:33 AM

I'm confused.. I thought we were already on the Kafka pipeline with udp2log being legacy to phase out and build atop the same pipeline?

Is the intention to have MediaWiki format and dispatch every message twice, with two different rsyslog/kafka handlers?

Perhaps it would make sense to use only one. For things we don't want to ingest in Logstash, they can be dropped at in-take. E.g. MediaWiki can add "udponly: 1" or "logstash: no" or some such to the packet as-needed. The Monolog stack is fairly complex so not having to instantiate two of them long-term would be nice. I suppose it might also make the pipeline easier to reason about for developers in terms of consistency etc.

We are on the Kafka pipeline for MW logs that were sent to logstash over the network, udp2log is still in place due to the high volume of logs but yes eventually we'd like to deprecate udp2log too and move everything to Kafka.

In terms of processing I don't know ATM if logstash has enough capacity to ingest everything and drop unwanted messages. My initial thought was to write udp2log messages to a different set of Kafka topics and consume them from mwlog hosts with kafkacat.

re: having a single Monolog instance to handle all logs and tag e.g. logstash: no, where would that logic live in MW? IIRC now the udplog vs logstash switch is based on the severity e.g. all debug messages make it to udp2log

fgiunchedi mentioned this in T261274: Figure out switchover steps for mwlog hosts.Aug 26 2020, 8:20 AM

fgiunchedi closed subtask T86969: Send scap log directly to logstash via syslog input as Resolved.Jan 25 2021, 11:28 AM

Krinkle mentioned this in T231025: LegacyHandler.php: PHP Warning: Host lookup failed [-10002]: Unknown error -10002.Feb 3 2021, 7:47 PM

lmata edited projects, added SRE Observability; removed observability.Jul 12 2021, 2:39 AM

Maintenance_bot added a project: observability.Jul 12 2021, 2:48 AM

lmata moved this task from Inbox to Backlog on the SRE Observability board.Jul 15 2021, 4:09 AM

lmata edited projects, added Observability-Logging; removed SRE Observability.Aug 9 2021, 1:12 AM

Maintenance_bot edited projects, added SRE Observability; removed Observability-Logging.Aug 9 2021, 1:46 AM

lmata edited projects, added Observability-Logging; removed SRE Observability.Aug 9 2021, 2:51 AM

Maintenance_bot edited projects, added SRE Observability; removed Observability-Logging.Aug 9 2021, 3:48 AM

fgiunchedi closed subtask T215499: Move wikimania-scholarships from udp2log to syslog as Resolved.Nov 23 2021, 1:45 PM

lmata edited projects, added Observability-Logging; removed SRE Observability.Jan 17 2022, 11:11 PM