Page MenuHomePhabricator

Figure out what to do with `fatalmonitor` script
Open, Needs TriagePublic

Description

fatalmonitor is a script on mwlog1001 that continuously shows the most common PHP errors. SWAT deployers are supposed to keep an eye on it while deploying, so that they can notice upcoming problems early (1, 2).

However, in its current implementation it’s specific to HHVM (reads /srv/mw-log/hhvm.log), and as a result has been completely broken for a week or so (/srv/mw-log/hhvm.log is gone), and of doubtful usefulness ever since we started sending significant amounts of traffic to PHP7.

What should we do with it?

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2019, 4:22 PM
Joe added a subscriber: Joe.Oct 3 2019, 5:51 AM

A few observations:

  • We have logstash dashboards that should help; however, sometimes logstash is hosed and lagging behind more than the on-disk counterpart
  • If we want to have a file on disk we follow, that would be probably the php7 one, but it needs additional parsing.

As @Joe said, we can use https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor instead of the script, until it is fixed.

Now the script itself has been removed from mwlog1001, it seems. Apparently this was done in I871c8e3241 as part of T229792.