Page MenuHomePhabricator

wgRequestId not being set to value of X-Request-ID header
Closed, ResolvedPublic

Description

This is set here:

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/WebRequest.php#328

If $wgAllowExternalReqId is true, it should be using the contents of the X-Request-Id header, otherwise it falls back on the UNIQUE_ID environment variable, otherwise it uses wfRandomString(24).

X-Request-ID looks something like c58328de-3387-43b1-abf9-11b26b7be474, which is distinguishable from UNIQUE_ID and the random string fallbacks, both of which appear to be something like YBM9UQrAAGIAAGSPhxUAAAAH.

The corresponding profiler code is finding HTTP_X_REQUEST_ID no problem, and profiles are being saved with this value to XHGui:

https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/profiler.php#142

But looking at the Javascript globals on the page, I'm seeing a value in the form used by UNIQUE_ID and wfRandomString(), *not* X-Request-ID.

The net effect is that the "Find in XHGui" link is being populated incorrectly.

Event Timeline

Change 660063 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] WebRequest: Radically simplify getRequestId() method

https://gerrit.wikimedia.org/r/660063

Change 660063 merged by jenkins-bot:
[mediawiki/core@master] WebRequest: Radically simplify getRequestId() method

https://gerrit.wikimedia.org/r/660063

Okay, so I'm fairly certain I've found the culprit.

I came around to this not from the profiler.php angle, but from the Logstash angle. We are seeing the same issue there where fatal errors for MW from MWExceptionHandler (using WebRequest::getRequestId) are given a certain reqId value, and then later on if the fatal error handler also fails or was unable to contain the issue, then our php-wmerrors script (etc/php7-fatal-error.php) reports the cascading issue to Logstash but this one often gets a different reqId assigned which is rather confusing.

From puppet: php7-fatal-error.php:

$reqId = $_SERVER['HTTP_X_REQUEST_ID'] ?? $_SERVER['UNIQUE_ID'] ?? null;

From mediawiki: WebRequest.php:

# mediawiki: DefaultSettings.php
$wgAllowExternalReqID = false;

# operations/mediawiki-config: InitialiseSettings.php (via LocalSettings.php)
'wgAllowExternalReqID' => true,

# mediawiki: Setup.php
require 'DefaultSettings,php';
HeaderCallback::register();

/* … */
require 'LocalSettings.php';


# mediawiki: WebRequest::getRequestId
if ( !self::$reqId ) {
	global $wgAllowExternalReqID;
	if ( $wgAllowExternalReqID ) {
		$id = $_SERVER['HTTP_X_REQUEST_ID'] ?? $_SERVER['UNIQUE_ID'] ?? wfRandomString( 24 );
	} else {
		$id = $_SERVER['UNIQUE_ID'] ?? wfRandomString( 24 );
	}
	self::$reqId = $id;
}

The two are nearly identical, and in terms of the source and fallback behaviour they are identical and work perfectly fine when reached. The problem is that the above code doesn't run the way one might think.

See, the first call to WebRequest::getRequestId() happens in Setup.php before LocalSettings.php is applied. Specifically, it happens via HeaderCallback::register() which funnily enough the line of code in HeaderCallback::register is not even storing the reqId value. It is explicitly documenting as calling the method so that the reqId gets computed and cached.

mediawiki HeaderCallback.php
	public static function register() {
		// T261260 load the WebRequest class, which will be needed in callback()
 		/* … */

		// Likewise, cache the request ID.
		\WebRequest::getRequestId();

Change 670279 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] HeaderCallback: Remove pre-caching of WebRequest::getRequestId

https://gerrit.wikimedia.org/r/670279

Krinkle triaged this task as Medium priority.

Change 670279 merged by jenkins-bot:
[mediawiki/core@master] HeaderCallback: Remove pre-caching of WebRequest::getRequestId

https://gerrit.wikimedia.org/r/670279