Page MenuHomePhabricator

/a/mw-log/hhvm.log: file not found on Fluorine
Closed, ResolvedPublic

Description

We didn't have an Icinga notification for this one.

Details

Event Timeline

thcipriani subscribed.

Hmm...but logstash fatalmonitor still appears to be working fine.

I definitely lean on fatalmonitor a lot during SWATs.

so now the output of fatalmonitor on fluorine is:

Every 2.0s: tail -n 1000 /a/mw-log/hhvm.log |                                                                                                                 Wed Aug 31 18:36:06 2016

tail: cannot open `/a/mw-log/hhvm.log' for reading: No such file or directory
greg triaged this task as Unbreak Now! priority.Aug 31 2016, 6:41 PM
greg subscribed.

This is pretty important and (hopefully not) indicative of something else going on.

I have made it a blocker of tonight deployment, tailing from log file give a way faster response time compared to logstash. But maybe I am getting old.

On fluorine ps -u udp2log f shows a wild range of python and sh process in zombie mode, but I think that is a known issue.

The process has been running for a while and there is a pipe relaying to logstash1001.eqiad.wmnet which seems to work. https://logstash.wikimedia.org/ mediawiki-errors has a bunch of messages.

Thus it seems the messages are properly relayed to fluorine but somehow are not written to disk. Maybe that is one of the sh that is defunct, I can't remember offhand how and whether demux.py write to disk directly.

logrotate went at 6:25

stat hhvm.log-20160831.gz has:

Access: 2016-08-31 18:57:18.977526992 +0000
Modify: 2016-08-31 06:25:13.000000000 +0000
Change: 2016-08-31 08:19:13.792872275 +0000

stat apache2.log-20160831.gz

Access: 2016-08-30 17:41:16.000000000 +0000
Modify: 2016-08-31 06:25:23.000000000 +0000
Change: 2016-08-31 06:26:53.452036313 +0000

The root dir /a/mw-log barely has anything.

Maybe that is a puppet change that broke it.

/etc/rsyslog.d/30-remote-syslog.conf has been changed at Aug 31 13:43

This comment was removed by hashar.

Change 307812 had a related patch set uploaded (by Alex Monk):
Follow-up I6df802b9: Fix udp2log's demux.py

https://gerrit.wikimedia.org/r/307812

Change 307812 merged by ArielGlenn:
Follow-up I6df802b9: Fix udp2log's demux.py

https://gerrit.wikimedia.org/r/307812

AlexMonk-WMF claimed this task.
AlexMonk-WMF subscribed.

This was my fault, mostly. Fixed now.