Page MenuHomePhabricator

Decouple logging infrastructure failures from MediaWiki logging
Closed, ResolvedPublic

Description

Notes from post mortem:

We need to reduce the heartache of inline logging to logstash: log to local rsyslog, rsyslog queues and degrades gracefully, then MW chugs on happily in case of flood. Without this we will run an unnecessarily elevated risk of losing logs at the slightest Redis outage, and ongoing cascading failures during periods of error or limited outage. The timeout to rsyslog should be very small. Logstash can alert on "dead" hosts I believe.

  • https://gerrit.wikimedia.org/r/#/c/181350/7/wmf-config/logging.php should have been reviewed more closely.
  • Set a *much* lower timeout for Redis / Memcached, *especially* for non-critical things like logging
  • Switch to UDP logging instead of Redis?
  • The 0.25 second timeout setting was copied from $wgMemCachedTimeout / $wgBloomFilterStores['main']['redisConfig']['connectTimeout']. It is probably not a good value for these configuration options either.
  • The timeout should be tested by making the logstash redis server actually unreachable and making sure that an app server under load isn't otherwise affected (i.e., everything but logging to logstash should work)
  • Any timeout won't do it. Any time we have something important failing there are going to be showers of those for EACH request. Async is the only way (either natively or by using rsyslog +1
  • Monolog stack allows us to easily change the transport layer by using a different "Handler" class. Just need to pick one (or even write our own if crazy). Rsyslog and GELF would both be possible. EIther could use UDP without going back to the fragile udp2log -> log2udp -> logstash pipeline we were using before. Local syslog forwarding is working pretty well for the hhvm logs I think. That may be the easiest to try first. One problem with that right now is that all of the hhvm rsyslog servers are pointed at the same IP in the logstash cluster instead of spreading the output to all of them. --bd808

Related Objects

Event Timeline

chasemp created this task.Feb 5 2015, 8:02 PM
chasemp raised the priority of this task from to Needs Triage.
chasemp updated the task description. (Show Details)
chasemp added a subscriber: chasemp.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 5 2015, 8:02 PM

Monolog logging is currently disabled in the prod cluster via https://gerrit.wikimedia.org/r/#/c/188837/. We should not re-enable it until we figure out the right transport to use for logstash shipping.

bd808 triaged this task as High priority.Feb 5 2015, 8:18 PM

The two easiest paths forward for this are to switch from Monolog\Handler\RedisHandler to either:

  • Monolog\Handler\SyslogUdpHandler and and add rsyslog config to forward to logstash as is done for HHVM and Apache logs
  • Monolog\Handler\GelfHandler and send UDP packets directly to logstash as the nodejs and hadoop clusters do

These are both transports that we already have enabled for Logstash and either would shift the MediaWiki output to fire and forget UDP packets.

The GELF handler requires the use of an additional PHP library (https://github.com/bzikarsky/gelf-php) that would need a security review to add to mediawiki/vendor. One thing I like better about it however is that it would allow pre-request round robin selection of a destination Logstash server. The current rsyslog forwarding pins all rsyslog input to a single Logstash host. This could probably be fixed with some LVS magic in front of the Logstash aggregators.

For the love of consistency I would vote rsyslog. It's very standard and fairly easily debugable and we have to live with it anyways. I'm not sure what you mean by pre-request round robin selection. Do you mean before sending each syslog event it would pick a logstash host from a pool and fire it off?

The way I have distributed syslog before is per host. We could potentially use hiera to split up our app server hosts by some grouping and each group uses a particular logstash aggregator? I would vote for that. Generally rsyslog filter conditions are robust enough I wouldn't be surprised if we could route things in that manner?

One possible negative of rsyslog as the shipping transport for structured application logs is the hard 4K line length limit. 4K sounds like a lot of text for a log message, but stack traces can get really big pretty easily. The message format that we are using with Monolog for shipping to Logstash is a JSON structure and if it is truncated by the transport layer the event will not be decoded and stored by Logstash. I'll try to figure out how common really large stacktraces are in our PHP stack by looking at some logs for the past few weeks.

Segregating the rsyslog traffic destination by host would probably cover the same thing I was trying to do with the random selection of Redis address per request in the current PHP code. Mostly I wanted to spread the MediaWiki log events across multiple Logstash backends in a semi-fair spread. For the non-Redis transports we are currently using (GELF and syslog) we have not been balancing the traffic. All HHVM & apache syslog events are sent to logstash1001 currently and the GELF applications have each picked a single logstash100* host as well.

One possible negative of rsyslog as the shipping transport for structured application logs is the hard 4K line length limit. 4K sounds like a lot of text for a log message, but stack traces can get really big pretty easily. The message format that we are using with Monolog for shipping to Logstash is a JSON structure and if it is truncated by the transport layer the event will not be decoded and stored by Logstash. I'll try to figure out how common really large stacktraces are in our PHP stack by looking at some logs for the past few weeks.

I think we could accomodate with

http://www.rsyslog.com/doc/v8-stable/configuration/global/index.html#true-global-directives

$MaxMessageSize ...32k is the current maximum needed for IHE

This is for TCP not UDP but I think we would be interested in TCP as transport mechanism if the actual offloading of the message from the box is done async from processing client requests.

faidon added a subscriber: faidon.Feb 9 2015, 5:08 PM

I think I'd prefer going with Gelf directly from MediaWiki rather than involving another component in the path. Are there any other downsides compared to syslog besides the extra PHP library dependency?

I think I'd prefer going with Gelf directly from MediaWiki rather than involving another component in the path. Are there any other downsides compared to syslog besides the extra PHP library dependency?

Not that I've found yet but I haven't deeply investigated Monolog's GELF support. I've been exploring the syslog path in T87521 but I can trial a GELF configuration as well once I get that working.

One downside of the syslog transport is that it will require configuration changes for the rsyslog on each MediaWiki host in addition to the Monolog configuration. Rsyslog would need to be configured to listen for UDP input at least on 127.0.0.1 and a relay rule would also be needed to forward the messages either directly to Logstash or to an intermediate aggregator. Initial investigation has also shown that the built-in Monolog syslog packet formatting system is not sufficient for our needs, but the fix for this seems trivial at this point.

Not that I've found yet but I haven't deeply investigated Monolog's GELF support. I've been exploring the syslog path in T87521 but I can trial a GELF configuration as well once I get that working.

I actually meant T88870. What I've found there is that we could use a syslog over UDP transport and send the datagrams either directly to Logstash or use rsyslog as an intermediate buffer/aggregator.

faidon added a subscriber: ori.Feb 10 2015, 4:14 PM

We don't really care about local logging, so I don't see why we'd involve the local rsyslog. Remote syslog over UDP or GELF over UDP work sound equally good from my perspective (i.e. without taking into account @ori's concerns about embedding JSON structures).

greg added a subscriber: greg.Feb 10 2015, 4:29 PM
chasemp added a comment.EditedFeb 10 2015, 4:32 PM

Thought about this a bit and actually have had this same conversation in the past. I don't think rsyslog is clearly a better option but if I was making the call I would choose it for these reasons:

  • Sending to local rsyslog means that in the event of whacky things happening (like a cascading failure that happens to bring logstash down while being coupled with an unforeseen network error) we can route logs to local disk on specific hosts to troubleshoot trivially
  • Rsyslog is another gear in the mechanism but with pretty low overhead. I think also that istening for logs locally is more or less standard practice anyway. Not much in the configuration overhead department.
  • It is true we don't do log review in any real fashion and the vast majority of our logs pass silently, but the periods where we really care about doing reconstruction of a timeline seem to be self-selecting as periods where network errors happen and thus we increase the risk of losing the only logs we actually care about.
  • This is more or less free resiliency and flexibility in the log routing pipeline that also maintains a standard

That being said, making it so this isn't a synchronous process with client requests was the only real actionable so I'm good with both proposed solutions Just wanted to add my 2 cents :)

bd808 added a comment.EditedFeb 11 2015, 1:00 AM

Input from one logging customer:

[14:44]  <    bd808>	hoo: how important is debug message capture loss to you as a consumer of the logs? Meaning how hard do you think we should be working to make sure that every wfDebugLog call ends up on fluorine and in logstash?
[14:44]  <    bd808>	Obviously no logging sucks, but is some amount of loss reasonable?
[14:45]  <      hoo>	You mean from the stuff we have set as what we want to log?
[14:45]  <      hoo>	In InitializeSettings
[14:45]  <    bd808>	Asking this in context of https://phabricator.wikimedia.org/T88732
[14:45]  <    bd808>	the redis thing was an attempt to be really lossless
[14:46]  <    bd808>	that blew up
[14:46]  <    bd808>	now trying to find real requirements to inform the next gen design
[14:46]  <      hoo>	I can't give you any numbers, but I had various occasions in the past where I needed one specific log entry
[14:47]  <      hoo>	So having loss there is a rather huge trade off
[14:47]  <    bd808>	I can say pretty confidently that udp2log loses some unquantified amount of messages
[14:48]  <      hoo>	I know :/
[14:48]  <      hoo>	Also api.log is now sampled and all things like taht :/
[14:48]  <      hoo>	Especially for the SULF stuff I had quite some occasions where I just couldn't investigate because of missing logs
[14:48]  <  legoktm>	+1 ^
[14:48]  <    bd808>	*nod*
[14:49]  <      hoo>	But before the hhvm switch it mostly just worked fine
[14:49]  <      hoo>	(and full api logs are also important, to stress that)

I took a look at GELF as a transport option and found it lacking. The Monolog GELF formatter works around GELF's lack of support for lists as would normally be used for stacktraces by embedding json encoded data in a string key that we would have to rehydrate on the Logstash side.

Thought about this a bit and actually have had this same conversation in the past. I don't think rsyslog is clearly a better option but if I was making the call I would choose it for these reasons:

  • Sending to local rsyslog means that in the event of whacky things happening (like a cascading failure that happens to bring logstash down while being coupled with an unforeseen network error) we can route logs to local disk on specific hosts to troubleshoot trivially
  • Rsyslog is another gear in the mechanism but with pretty low overhead. I think also that listening for logs locally is more or less standard practice anyway. Not much in the configuration overhead department.
  • It is true we don't do log review in any real fashion and the vast majority of our logs pass silently, but the periods where we really care about doing reconstruction of a timeline seem to be self-selecting as periods where network errors happen and thus we increase the risk of losing the only logs we actually care about.
  • This is more or less free resiliency and flexibility in the log routing pipeline that also maintains a standard

That being said, making it so this isn't a synchronous process with client requests was the only real actionable so I'm good with both proposed solutions Just wanted to add my 2 cents :)

As I see the debate thus far, the discussion has settled down to whether or not to employ rsyslog on the localhost as an intermediary between MediaWiki's Monolog logging stack and Logstash.

Pros:

  • Log forwarding rules in a semi-central place
    • Apache2 and HHVM already relay logs to Logstash via rsyslog forwarding rules
  • Ability to tweak rsyslog rules to dump logs locally if needed in a debugging scenario
  • Ability to use TCP over datagrams for forwarding to Logstash
  • (?) Ability to buffer log events in case of Logstash or network routing failure

Cons:

  • Need to configure rsyslog to ensure that large datagrams (up to 64K) are collected and forwarded properly
  • Yet another moving part that might fail in the log stack
  • (?) Round-robin use of multiple Logstash backends for each rsyslog instance difficult

The easiest way to move forward from the MediaWiki side as I see it would be to merge my Syslog over UDP Handler and configure MediaWiki to talk directly to Logstash. Rsyslog could be inserted as an intermediary if and when Ops decide that the merits of having it outweigh the negatives. Switching at that point would only require a config change in MediaWiki to alter the host and port that Monolog was emitting datagrams to.

Does this sound like a reasonable plan or should I go back to the drawing board and look for other options?

Seems reasonable to me :). We can always adjust as you say.

I have everything working in beta to ship log events to Logstash using syslog UDP datagrams sent directly from MediaWiki to the Logstash server. See T88870 for details. We could start rolling this to production in parallel with the upcoming 1.25wmf18 branch.

Backporting all the necessary pieces to 1.25wmf17 would be kind of a pain and until we have new hardware for the Logstash Elasticsearch cluster (T89402) a full deployment would be likely to cause traffic volume related outages/gaps in Logstash anyway.

gerritbot added a subscriber: gerritbot.

Change 191259 had a related patch set uploaded (by BryanDavis):
logstash: Ship logs via syslog udp datagrams

https://gerrit.wikimedia.org/r/191259

Patch-For-Review

bd808 added a project: MediaWiki-Core-Team.
bd808 moved this task from Backlog to In Dev/Progress on the MediaWiki-Core-Team board.
bd808 changed the task status from Open to Stalled.Feb 24 2015, 12:29 AM

The configuration change to begin testing this in production has been blocked in code review based on the complexity and current method of generating the Monolog logger configuration. I have neither the time nor the energy currently to rethink and rewrite the entire configuration mechanism and I'm honestly not sure that the simple configuration desired by the reviewer is possible with the current state of MediaWiki logging.

ori added a comment.Feb 24 2015, 12:45 AM

If you are not sure that using Monolog is a good solution, evaluate it further in Labs or another development environment before enabling it in production. If you are sure that Monolog is a good solution, then there is no reason to shirk the work of introducing it properly, which is by deprecating $wgDebugLogGroups and rewriting the configuration blocks that utilize it in operations/mediawiki-config to use Monolog instead.

In T88732#1061071, @ori wrote:

If you are not sure that using Monolog is a good solution, evaluate it further in Labs or another development environment before enabling it in production. If you are sure that Monolog is a good solution, then there is no reason to shirk the work of introducing it properly, which is by deprecating $wgDebugLogGroups and rewriting the configuration blocks that utilize it in operations/mediawiki-config to use Monolog instead.

Fair enough. The specifics of how the Monolog config will be presented in mediawiki-config may be the point of debate. A direct Monolog config will look something like https://gerrit.wikimedia.org/r/#/c/175604/3/wmf-config/logging-labs.php,unified. Will that pass your muster or will it also be rejected for being too complex? MaxSem wasn't excited about that level of config verbosity which in part led to the first generated config I introduced in https://gerrit.wikimedia.org/r/#/c/178977/1/wmf-config/logging-labs.php,unified.

It was also a nice thing to be able to generate the config at that point so that it would not drift by having two separate logging systems to configure during the transition from the LegacyLogger to Monolog. In some sense we are back in that position again until we are sure that the syslog udp transport will work for us at production scale, I want to go through testing this on group0 followed by group1 before finally re-enabling it for all wikis. The volume of log messages emitted in beta really doesn't give anything a workout as far as load testing goes.

I am also not keen to re-enable all wikis logging to Logstash until we get the new hardware for the backing Elasticsearch cluster and migrate over to using it. Based on past performance I think we can just handle group0 + group1 traffic. Adding in the Wikipedia sites brought our little logging cluster to its knees last time and I seen no reason to believe that it won't do it again.

Maybe what we could do for now is only configure a small number of logging channels (exceptions being most useful in my opinion and the api data that Brad needs for his extension being next on the list) across all wikis to go to Logstash and leave the rest going only to the udp2log service on fluorine.

RandomDSdevel awarded a token.

I have a new an improved version of the mediawiki-config patch (https://gerrit.wikimedia.org/r/#/c/191259/) that I hope better addresses @ori's concerns than my prior attempts. The config is still more complex than I would ultimately like to see, but until all of core is converted to PSR-3 style logging with appropriate log levels for each message (T88649) I think it's about as compact as it can really be.

Change 191259 merged by jenkins-bot:
debug logging: Convert to Monolog logging

https://gerrit.wikimedia.org/r/191259

Change 209170 had a related patch set uploaded (by BryanDavis):
Send group0 group1 MediaWiki events to logstash

https://gerrit.wikimedia.org/r/209170

Change 209172 had a related patch set uploaded (by BryanDavis):
Send MediaWiki events for all wikis to Logstash

https://gerrit.wikimedia.org/r/209172

Change 209170 merged by jenkins-bot:
Send group0 group1 MediaWiki events to logstash

https://gerrit.wikimedia.org/r/209170

Change 209172 merged by jenkins-bot:
Send MediaWiki events for all wikis to Logstash

https://gerrit.wikimedia.org/r/209172

bd808 closed this task as Resolved.May 11 2015, 3:17 PM

All wikis are logging to Logstash again using the syslog UDP datagram transport.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJul 6 2015, 4:01 PM
Restricted Application added a project: User-bd808. · View Herald TranscriptJul 29 2016, 8:19 PM
bd808 moved this task from To Do to Archive on the User-bd808 board.Sep 1 2016, 5:14 PM