Page MenuHomePhabricator

ATS: log inspection at runtime
Closed, ResolvedPublic

Description

The ATS test cluster uses a named pipe as the configured log destination.

One of our requirements is to be able to read live log entries as they appear at runtime with multiple programs, both daemons and interactive ones.

An option to achieve the objective is to have a daemon that continuously reads from the named pipe and accepts connections from multiple clients. The clients can specify an optional filter and are able to see all matching logs.

@fgiunchedi mentioned nginx-log-peeker, which lacks the log filtering component but seems interesting.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as Medium priority.Sep 13 2018, 12:15 PM
ema moved this task from Backlog to Caching on the Traffic board.

Change 473232 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: logging.yaml support

https://gerrit.wikimedia.org/r/473232

Change 473232 merged by Ema:
[operations/puppet@production] ATS: logging.yaml support

https://gerrit.wikimedia.org/r/473232

Change 473432 had a related patch set uploaded (by Ema; owner: Ema):
[operations/software/fifo-log-demux@master] fifo-log-demux 0.1

https://gerrit.wikimedia.org/r/473432

Change 473503 had a related patch set uploaded (by Ema; owner: Ema):
[integration/config@master] Test fifo-log-demux with debian-glue-non-voting

https://gerrit.wikimedia.org/r/473503

Change 473554 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Add module to install and configure fifo-log-demux

https://gerrit.wikimedia.org/r/473554

Change 473555 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] trafficserver: configure fifo-log-demux

https://gerrit.wikimedia.org/r/473555

Change 473432 merged by Ema:
[operations/software/fifo-log-demux@master] fifo-log-demux 0.1

https://gerrit.wikimedia.org/r/473432

Mentioned in SAL (#wikimedia-operations) [2018-11-15T08:58:54Z] <ema> upload fifo-log-demux 0.1 to stretch-wikimedia T204225

Change 473554 merged by Ema:
[operations/puppet@production] Add module to install and configure fifo-log-demux

https://gerrit.wikimedia.org/r/473554

Change 473555 merged by Ema:
[operations/puppet@production] trafficserver: configure fifo-log-demux

https://gerrit.wikimedia.org/r/473555

Change 473705 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add atslog

https://gerrit.wikimedia.org/r/473705

Change 473705 merged by Ema:
[operations/puppet@production] ATS: add atslog

https://gerrit.wikimedia.org/r/473705

Change 473503 merged by jenkins-bot:
[integration/config@master] Test fifo-log-demux with debian-glue-non-voting

https://gerrit.wikimedia.org/r/473503

Change 473712 had a related patch set uploaded (by Ema; owner: Ema):
[integration/config@master] Test fifo-log-demux with debian-glue

https://gerrit.wikimedia.org/r/473712

Change 473712 merged by jenkins-bot:
[integration/config@master] Test fifo-log-demux with debian-glue

https://gerrit.wikimedia.org/r/473712

Change 474146 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/trafficserver@master] Drop 0011-logging-broken-pipe-no-spam.patch

https://gerrit.wikimedia.org/r/474146

Change 474156 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/trafficserver@master] trafficserver (8.0.0-1wm2) stretch-wikimedia; urgency=medium

https://gerrit.wikimedia.org/r/474156

Change 474146 merged by Ema:
[operations/debs/trafficserver@master] Drop 0011-logging-broken-pipe-no-spam.patch

https://gerrit.wikimedia.org/r/474146

Change 474156 merged by Ema:
[operations/debs/trafficserver@master] trafficserver (8.0.0-1wm2) stretch-wikimedia; urgency=medium

https://gerrit.wikimedia.org/r/474156

Mentioned in SAL (#wikimedia-operations) [2018-11-16T14:39:16Z] <ema> trafficserver 8.0.0-1wm2 uploaded to stretch-wikimedia T204225 T204209

There's a problem with fifo-log-demux reading from the pipe, reopening!

Change 474749 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: disable icinga notifications

https://gerrit.wikimedia.org/r/474749

Change 474749 merged by Ema:
[operations/puppet@production] ATS: disable icinga notifications

https://gerrit.wikimedia.org/r/474749

Change 474753 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: actually disable notifications

https://gerrit.wikimedia.org/r/474753

Change 474753 merged by Ema:
[operations/puppet@production] ATS: actually disable notifications

https://gerrit.wikimedia.org/r/474753

Yesterday all ATS hosts ran out of disk space. That's due to trafficserver logging several messages like the following:

File:/var/log/trafficserver/notpurge.pipe was closed, have dropped (N) bytes.

There are a few things going wrong and causing those messages:

  1. trafficserver closes its open logpipes upon logging.yaml config reload. It should keep them open the same way as it does for regular files.
  2. traffic_server -C verify_config, which we're currently using in the check_trafficserver_verify_config Icinga check, triggers a logging.yaml configuration reload instead of simply verifying if the syntax of the various configuration files is correct.
  3. whenever trafficserver finds its logfiles (pipes included) to be closed, it logs an error message to stdout (and thus to syslog).

While (2) isn't optimal, we need to ensure that logging keeps on working fine upon configuration reload. Hence, we should fix (1) and make sure that fifo-log-demux re-opens the pipe if it's closed.

Change 474909 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: avoid verify_config for now

https://gerrit.wikimedia.org/r/474909

Change 474909 merged by Ema:
[operations/puppet@production] ATS: avoid verify_config for now

https://gerrit.wikimedia.org/r/474909

  1. trafficserver closes its open logpipes upon logging.yaml config reload

s/closes/unlinks/. Bug filed upstream: https://github.com/apache/trafficserver/issues/4635

Change 475291 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475291

Change 475291 merged by Ema:
[operations/puppet@production] ATS: add check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475291

Change 475294 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: pass fifo filename to check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475294

Change 475294 merged by Ema:
[operations/puppet@production] ATS: pass fifo filename to check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475294

Change 475313 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] fifo-log-demux: restart upon writer restart

https://gerrit.wikimedia.org/r/475313

Change 475313 merged by Ema:
[operations/puppet@production] fifo-log-demux: restart upon writer restart, remove cargo cult

https://gerrit.wikimedia.org/r/475313

Change 476018 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: log X-Cache-Status and X-Client-IP

https://gerrit.wikimedia.org/r/476018

Change 476018 merged by Ema:
[operations/puppet@production] ATS: log X-Cache-Status and X-Client-IP

https://gerrit.wikimedia.org/r/476018

Change 478158 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: configuration settings for named pipe logging

https://gerrit.wikimedia.org/r/478158

Change 478158 merged by Ema:
[operations/puppet@production] ATS: configuration settings for named pipe logging

https://gerrit.wikimedia.org/r/478158

Change 479612 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: do not log varnishcheck requests

https://gerrit.wikimedia.org/r/479612

Change 479612 merged by Ema:
[operations/puppet@production] ATS: do not log varnishcheck requests

https://gerrit.wikimedia.org/r/479612