Page MenuHomePhabricator

mtail rc35 stops incrementing atsmtail counters
Closed, ResolvedPublic

Description

Evidenced by check_trafficserver_log_fifo_analytics_tls and check_trafficserver_log_fifo_notpurge_backend unknown status on May 30.

Logs from mtail are the same across the DC a couple days following the upgrade:

May 30 21:26:19 cp5011 atsmtail-tls[46540]: E0530 21:26:19.037737   46542 log_watcher.go:198] fsnotify error: fsnotify queue overflow

Issuing a restart renders many duplicates of this entry:

Jun 01 19:45:09 cp5011 atsmtail-tls[34714]: 2020/06/01 19:45:08 Unable to read from socket: dial unix /srv/trafficserver/tls/var/run/analytics.sock: connect: connection refused

It necessitates an additional restart to get metrics flowing again.

Related Objects

Event Timeline

colewhite triaged this task as Medium priority.

Mentioned in SAL (#wikimedia-operations) [2020-06-01T20:14:13Z] <shdubsh> downgrade mtail to rc5 in ulsfo -- T254192

Change 601430 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] varnish: add support for additional mtail args and set disable_fsnotify in eqsin

https://gerrit.wikimedia.org/r/601430

Change 601436 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: add additional mtail args support and set disable_fsnotify in eqsin

https://gerrit.wikimedia.org/r/601436

Change 601440 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: add additional flags support for atsmtail and disable_fsnotify in eqsin

https://gerrit.wikimedia.org/r/601440

Rolled back codfw and ulsfo, but left eqsin for testing the -disable_fsnotify flag. There doesn't appear to be any user-facing impact aside from metrics no longer incrementing once this condition is hit.

Change 601436 merged by Cwhite:
[operations/puppet@production] profile: add additional mtail args support and set disable_fsnotify in eqsin

https://gerrit.wikimedia.org/r/601436

Change 601440 merged by Cwhite:
[operations/puppet@production] profile: add additional flags support for atsmtail and disable_fsnotify in eqsin

https://gerrit.wikimedia.org/r/601440

Change 601430 merged by Cwhite:
[operations/puppet@production] varnish: add support for additional mtail args and set disable_fsnotify in eqsin

https://gerrit.wikimedia.org/r/601430

This issue hasn't resurfaced since disabling fsnotify. Moving forward with the upgrade.