We didn't have an Icinga notification for this one.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Follow-up I6df802b9: Fix udp2log's demux.py | operations/puppet | production | +1 -1 |
Revisions and Commits
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | hashar | T142117 MW-1.28.0-wmf.17 deployment blockers | |||
Resolved | • AlexMonk-WMF | T144389 /a/mw-log/hhvm.log: file not found on Fluorine |
Event Timeline
Hmm...but logstash fatalmonitor still appears to be working fine.
I definitely lean on fatalmonitor a lot during SWATs.
so now the output of fatalmonitor on fluorine is:
Every 2.0s: tail -n 1000 /a/mw-log/hhvm.log | Wed Aug 31 18:36:06 2016 tail: cannot open `/a/mw-log/hhvm.log' for reading: No such file or directory
I have made it a blocker of tonight deployment, tailing from log file give a way faster response time compared to logstash. But maybe I am getting old.
On fluorine ps -u udp2log f shows a wild range of python and sh process in zombie mode, but I think that is a known issue.
The process has been running for a while and there is a pipe relaying to logstash1001.eqiad.wmnet which seems to work. https://logstash.wikimedia.org/ mediawiki-errors has a bunch of messages.
Thus it seems the messages are properly relayed to fluorine but somehow are not written to disk. Maybe that is one of the sh that is defunct, I can't remember offhand how and whether demux.py write to disk directly.
logrotate went at 6:25
stat hhvm.log-20160831.gz has:
Access: 2016-08-31 18:57:18.977526992 +0000 Modify: 2016-08-31 06:25:13.000000000 +0000 Change: 2016-08-31 08:19:13.792872275 +0000
stat apache2.log-20160831.gz
Access: 2016-08-30 17:41:16.000000000 +0000 Modify: 2016-08-31 06:25:23.000000000 +0000 Change: 2016-08-31 06:26:53.452036313 +0000
The root dir /a/mw-log barely has anything.
Maybe that is a puppet change that broke it.
After a bunch of investigation with @bd808 @ArielGlenn @Krenair :
Aug 30 19:48 /usr/local/bin/demux.py
Most probably caused by de93e39b978ae213c61b0ef76ac14e0240158110
Change 307812 had a related patch set uploaded (by Alex Monk):
Follow-up I6df802b9: Fix udp2log's demux.py