Page MenuHomePhabricator

MWExceptionHandler reqId sometimes differs from php-wmerrors reqId
Closed, ResolvedPublic

Description

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2020.03.16/mediawiki?id=AXDky3uCKWrIH1QRCes2

timestamp: 2020-03-16T19:21:42.517620+00:00
host: mw1378
type: mediawiki
channel: exception
caught_by: /etc/php/php7-fatal-error.php via php-wmerrors
reqId:  Xm-RwwpAMNwAAlOE5gwAAABA # <-- different
message:

[Xm-RwwpAMNwAAlOE5gwAAABA] PHP Fatal error: Allowed memory size of 692060160 bytes exhausted (tried to allocate 1179648 bytes) in /srv/mediawiki/php-1.35.0-wmf.23/includes/cache/MessageCache.php on line 283

And the exact same error instance reported by the MW side:
https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2020.03.16/mediawiki/?id=AXDky3vkh3Uj6x1zZRwi

timestamp:  2020-03-16T19:21:42+00:00
host: mw1378
type: mediawiki
channel: exception
caught_by:  mwe_handler
reqId: Xm-RwwpAMFwAALzTp1IAAACY # <-- different
unique_id: Xm-RwwpAMNwAAlOE5gwAAABA # <!-- (interesting)
message:

[Xm-RwwpAMFwAALzTp1IAAACY] /w/api.php?…   PHP Fatal Error: Allowed memory size of 692060160 bytes exhausted (tried to allocate 1179648 bytes) in /srv/mediawiki/php-1.35.0-wmf.23/includes/cache/MessageCache.php on line 283

This should be the same. Without that, the standard practice of searching by reqId is missing results.

Event Timeline

Krinkle added a subscriber: WDoranWMF.

@WDoranWMF This may be a good issue for Clinic Duty to look into how core's MWExceptionHandler works. I covered most if it already, but this would apply it in practice. I haven't investigated yet, but am happy to help whomever does.

Anomie moved this task from Inbox to Triage Meeting Inbox on the Platform Engineering board.
Anomie subscribed.

Within MediaWiki, the request ID is determined inside WebRequest::getRequestId() and considers three sources (in order):

  1. If $wgAllowExternalReqID is true (as it is on Wikimedia sites) and the X-Request-Id HTTP header is present, its value is used.
  2. If $_SERVER['UNIQUE_ID'] is set (e.g. from Apache mod_unique_id), its value is used.
  3. A random value is generated on the first call.

The description above mentions /etc/php/php7-fatal-error.php; checking that on mwmaint1002, I see it only considers $_SERVER['UNIQUE_ID'] when generating "reqId" and prefixing the message. It would have to be updated to also consider X-Request-Id first before falling back to UNIQUE_ID. I note that file is in operations/puppet, where CPT in general doesn't have merge rights (so tagging SRE).

As for "unique_id" in the log data, that comes from the vendor class Monolog\Processor\WebProcessor and also reflects only $_SERVER['UNIQUE_ID'].

Change 582947 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] WebRequest: Avoid unused wfRandomString() call

https://gerrit.wikimedia.org/r/582947

Change 582948 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] mediawiki: Fix reqId in php7-fatal-error.php to consider X-Request-Id

https://gerrit.wikimedia.org/r/582948

Change 582948 merged by Alexandros Kosiaris:
[operations/puppet@production] mediawiki: Fix reqId in php7-fatal-error.php to consider X-Request-Id

https://gerrit.wikimedia.org/r/582948

Anomie reassigned this task from holger.knust to Krinkle.
Anomie added a subscriber: holger.knust.

Change 582947 merged by jenkins-bot:
[mediawiki/core@master] WebRequest: Avoid unused wfRandomString() call

https://gerrit.wikimedia.org/r/582947