Page MenuHomePhabricator

/a/mw-log/hhvm.log: file not found on Fluorine
Closed, ResolvedPublic

Description

We didn't have an Icinga notification for this one.

Event Timeline

thcipriani subscribed.

Hmm...but logstash fatalmonitor still appears to be working fine.

I definitely lean on fatalmonitor a lot during SWATs.

so now the output of fatalmonitor on fluorine is:

Every 2.0s: tail -n 1000 /a/mw-log/hhvm.log |                                                                                                                 Wed Aug 31 18:36:06 2016

tail: cannot open `/a/mw-log/hhvm.log' for reading: No such file or directory
greg triaged this task as Unbreak Now! priority.Aug 31 2016, 6:41 PM
greg subscribed.

This is pretty important and (hopefully not) indicative of something else going on.

I have made it a blocker of tonight deployment, tailing from log file give a way faster response time compared to logstash. But maybe I am getting old.

On fluorine ps -u udp2log f shows a wild range of python and sh process in zombie mode, but I think that is a known issue.

The process has been running for a while and there is a pipe relaying to logstash1001.eqiad.wmnet which seems to work. https://logstash.wikimedia.org/ mediawiki-errors has a bunch of messages.

Thus it seems the messages are properly relayed to fluorine but somehow are not written to disk. Maybe that is one of the sh that is defunct, I can't remember offhand how and whether demux.py write to disk directly.

logrotate went at 6:25

stat hhvm.log-20160831.gz has:

Access: 2016-08-31 18:57:18.977526992 +0000
Modify: 2016-08-31 06:25:13.000000000 +0000
Change: 2016-08-31 08:19:13.792872275 +0000

stat apache2.log-20160831.gz

Access: 2016-08-30 17:41:16.000000000 +0000
Modify: 2016-08-31 06:25:23.000000000 +0000
Change: 2016-08-31 06:26:53.452036313 +0000

The root dir /a/mw-log barely has anything.

Maybe that is a puppet change that broke it.

/etc/rsyslog.d/30-remote-syslog.conf has been changed at Aug 31 13:43

This comment was removed by hashar.

Change 307812 had a related patch set uploaded (by Alex Monk):
Follow-up I6df802b9: Fix udp2log's demux.py

https://gerrit.wikimedia.org/r/307812

Change 307812 merged by ArielGlenn:
Follow-up I6df802b9: Fix udp2log's demux.py

https://gerrit.wikimedia.org/r/307812

AlexMonk-WMF claimed this task.
AlexMonk-WMF subscribed.

This was my fault, mostly. Fixed now.