Page MenuHomePhabricator

Format Monolog messages to be easier to grep, bucket them
Closed, ResolvedPublic1 Estimated Story Points

Description

Bug is resolved once MediaWiki errors are bucketed into a new logfile and replicated with the others.

Include some more information in the logline, like @Jgreen suggests in T107918:

Oct 16 09:25:04 payments1001 mediawiki[123]: {arbitrary message text}

Implementation

Set logging configuration in LocalSettings.php:

$wgMWLoggerDefaultSpi = array(
  'class' => 'MediaWiki\Logger\MonologSpi',
  'args' => array( array(
  'formatters' => array(
      'line' => array(
      'class' => 'Monolog\Formatter\LineFormatter',
-     'args' => array( '%message%' ),
+     'args' => array( 'mediawiki[%extra.process_id%]: %message%' ),
      ),
  ),

Event Timeline

awight renamed this task from Format Monolog messages to be easier to grep to Format Monolog messages to be easier to grep, bucket them.
awight claimed this task.
awight raised the priority of this task from to High.
awight updated the task description. (Show Details)
awight set Security to None.
awight edited a custom field.
awight added subscribers: atgo, Aklapper, awight, Jgreen.

I think this is the one, gotta test it though:

'formatters' => array(
    'line' => array(
        'class' => 'Monolog\Formatter\LineFormatter',
        'args' => array( 'mediawiki[%extra.process_id%] %message%' ),
    ),
),

That was the one. Well, plus a missing colon. The stable part of these messages is "mediawiki[PID]: ", is that enuf to bucket on?

On my local, syslog reads like:

Nov 19 12:06:45 local mediawiki: mediawiki[1054]: [15405221] /index.php/Main_Page Exception from line 180 of /X/payments/includes/MediaWiki.php: BAD FOO #0 /X/payments/includes/MediaWiki.php(567): MediaWiki->performRequest() #1 /X/payments/includes/MediaWiki.php(415): MediaWiki->main() #2 /X/payments/index.php(41): MediaWiki->run() #3 {main}

@Jgreen
The new message format is deployed. These errors are very rare AFAIK, and I was not able to generate one without patching the code. Let me know if you think that's necessary.

Reassigning to JGreen to indicate that I'm too lazy to make a subtask.

@Jgreen
The new message format is deployed. These errors are very rare AFAIK, and I was not able to generate one without patching the code. Let me know if you think that's necessary.

I'm unable to find a single instance in logs since 11/10, either central or locally logged (I checked only payments1001). You sure this thing is on? Matching $programname on mediawiki should be no problem, but I'm not sure where to apply the filter since I'm not seeing any logs.

Okay, assigning back to myself--This is probably because no fatal errors have happened. I'll have to deploy a bug and tickle it.

Change 255303 had a related patch set uploaded (by Awight):
[TEMPORARY] Instrument to tickle error streams

https://gerrit.wikimedia.org/r/255303

@Jgreen
The "tickle" patch will allow us to cause loglines to go into the new fatal error stream. Only these two messages should be bucketed:

Nov 24 16:40:39 X mediawiki: mediawiki[19997]: [510dd5a1] [no req] Exception from line 26 of /X/payments/extensions/DonationInterface/maintenance/TestCrash.php: CRASHTEST: uncaught exception #0 /X/payments/maintenance/doMaintenance.php(103): TestCrash->execute() #1 /X/payments/extensions/DonationInterface/maintenance/TestCrash.php(38): include() #2 {main}

and

Nov 24 16:41:44 X mediawiki: mediawiki[20990]: [dfe2919d] [no req] ErrorException from line 264 of /X/payments/includes/exception/MWExceptionHandler.php: Fatal Error: Class undefined: FOO MediaWiki or an installed extension requires this class but it is not embedded directly in MediaWiki's git repository and must be installed separately by the end user. Please see <a href="https://www.mediawiki.org/wiki/Download_from_Git#Fetch_external_libraries">mediawiki.org</a> for help on installing the required components. #0 (): MWExceptionHandler::handleFatalError() #1 {main}

However, we should *not* see the following message in that bucket, it should appear in the normal payments.error stream:

CRASHTEST: error

I haven't deployed the code yet, just pasting those loglines while I had them available.

Change 255307 had a related patch set uploaded (by Awight):
[TEMPORARY] Instrument to tickle error streams

https://gerrit.wikimedia.org/r/255307

Change 255307 merged by jenkins-bot:
[TEMPORARY] Instrument to tickle error streams

https://gerrit.wikimedia.org/r/255307

Change 255303 merged by jenkins-bot:
[TEMPORARY] Instrument to tickle error streams

https://gerrit.wikimedia.org/r/255303

@Jgreen
I've tickled the logs with the tool I described above, timestamps are between Nov 26 01:30 to 01:40 UTC. I don't see anything anywhere yet... I'll leave the code deployed so we can generate more logs if needed...

@awight this looks like it should be closed?

The only difference I see between my configuration and production is pretty substantial: I'm using PHP 5.6 and HHVM, but production uses PHP 5.3.

Perhaps we can just bucket the line that PHP emits, by matching:

php: PHP Fatal error:

bd808 confirmed that PHP 5.3 is not able to catch these fatal errors, so matching PHP Fatal error is our only hope. Wheee!

Perhaps we can just bucket the line that PHP emits, by matching:

php: PHP Fatal error:

Yup we can do that. I see 80 or so of those in the payment-listener-misc log. What would you like for the new bucket file name?

Looks like we could use a bucket for "PHP Warning:" as well...

directed into file payments.warning?

Hehe, yeah that sounds perfect. Sorry that I was too sad about that naming
convention to draw the final dotted line. It bothers me that we use both
dot and hyphen, but I'm sure there was a good reason...

Hah well that's easily remedied if we care enough!