Error
MediaWiki version: 1.35.0-wmf.1
PHP Warning: preg_match(): Compilation failed: two named subpatterns have the same name at offset 62
MediaWiki version: 1.35.0-wmf.1
PHP Warning: preg_match(): Compilation failed: two named subpatterns have the same name at offset 62
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /includes/PathRouter.php(311): preg_match(string, string, NULL)
#2 /includes/PathRouter.php(285): PathRouter::extractTitle(string, stdClass)
#3 /includes/PathRouter.php(260): PathRouter->internalParse(string)
#4 /includes/WebRequest.php(205): PathRouter->parse(string)
#5 /includes/WebRequest.php(361): WebRequest::getPathInfo(string)
#6 /includes/Setup.php(780): WebRequest->interpolateTitle()
#7 /includes/WebStart.php(81): require_once(string)
#8 /index.php(41): require(string)
#9 /srv/mediawiki/w/index.php(3): require(string)
#10 {main}| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Stop using SCRIPT_NAME where possible, rely on statically configured routing | mediawiki/core | master | +79 -44 |
Quick glance:
Some ad-hoc debugging on mwdebug1002 from this function reveals it tries three configured router paths:
PathRouter::extractTitle path = /wiki/Wikimedia_Taiwan/wiki/index.php5/$1 pattern=/wiki/$1 SCRIPT_NAME=/wiki/Wikimedia_Taiwan/wiki/index.php5/$1 PathRouter::extractTitle: path = /wiki/Wikimedia_Taiwan/wiki/index.php5/$1 pattern=/w/index.php/$1 SCRIPT_NAME=/wiki/Wikimedia_Taiwan/wiki/index.php5/$1 PathRouter::extractTitle: path = /wiki/Wikimedia_Taiwan/wiki/index.php5/$1 pattern=/wiki/Wikimedia_Taiwan/wiki/index.php5/$1/$1 SCRIPT_NAME=/wiki/Wikimedia_Taiwan/wiki/index.php5/$1
This last one is the likely cause and appears to be due to something also feeding the url as router path, which seems bad. And worse, it doesn't escape it in any way.
I imagine this might also explain why requests for static.php are calling this path, and more over, trying to match for a /w/static.php/$1 path which definitely isn't meant to happen, e.g. from https://meta.wikimedia.org/w/skins/Vector/images/arrow-down.svg
PathRouter::extractTitle: path = /w/skins/Vector/images/arrow-down.svg pattern=/wiki/$1 SCRIPT_NAME=/w/static.php PathRouter::extractTitle: path = /w/skins/Vector/images/arrow-down.svg pattern=/w/index.php/$1 SCRIPT_NAME=/w/static.php PathRouter::extractTitle: path = /w/skins/Vector/images/arrow-down.svg pattern=/w/static.php/$1 SCRIPT_NAME=/w/static.php
That seems wrong. $_SERVER['SCRIPT_NAME'] is supposed to be the current script's path, without the pathinfo part.
I imagine this might also explain why requests for static.php are calling this path, and more over, trying to match for a /w/static.php/$1 path which definitely isn't meant to happen, e.g. from https://meta.wikimedia.org/w/skins/Vector/images/arrow-down.svg
It seems that's intentional in WebRequest's code.
if ( isset( $_SERVER['SCRIPT_NAME'] ) && strpos( $_SERVER['SCRIPT_NAME'], '.php' ) !== false ) { // Check for SCRIPT_NAME, we handle index.php explicitly // But we do have some other .php files such as img_auth.php // Don't let root article paths clober the parsing for them $router->add( $_SERVER['SCRIPT_NAME'] . "/$1" ); }
That was added in rMWae1d5aefbf45: Update img_auth.php and WebRequest code to handle non index.php scripts like…, which mentions T34486: WebRequest::getPathInfo() broken in img_auth.php on DreamHost.
@D3r1ck01 - just for clarity, are you still proposing 565762: PathRouter: Allow duplicate names for subpatterns in RegEx as a fix for this one?
I believe you're right. The regular expression seems fine. One URL to MediaWiki can only have 1 primary title, extracted from one $-placeholder in the url path-pattern. The code that consumes the result is not prepared to handle an array of multiple values, and there can't be any way to process such array, since that is fundmantally is how MediaWiki works.
The fact that we concatenate two strings and accidentally create another "$1" pattern is an accident. The solution will need to be in preventing that problem from reaching the regular expression (higher level, the input to the regex). The solution cannot be to let the problem sink deeper (lower level, the regex).
I note that something like this just happend in group1 for 1.35.0-wmf.22:
2020-03-04T17:07:19 on metawiki
url /wiki/Wikimedia_Taiwan/wiki/index.php5/$1
unique id Xl-gRwpAICEAAAVeBEIAAAEY
https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2020.03.04/mediawiki?id=AXCmhCNgarkxubcm-g_o&_g=h@44136fa should be a direct link.
I can reproduce this locally. PATH_INFO is not set, so SCRIPT_NAME just gets set to the same thing as REQUEST_URI. In ap_add_cgi_vars(), if r->path_info is set, the URI is split and SCRIPT_NAME receives the first part of the path, but it's not set.
mod_proxy_fcgi has some code for setting PATH_INFO, depending on the proxy-fcgi-pathinfo environment variable, but this code is not used when mod_proxy_fcgi is invoked via SetHandler. The proxy_fcgi_canon hook is never called.
Note that SCRIPT_NAME is the pre-rewrite URL, it does not even include the script name. Pretty much every usage of SCRIPT_NAME in core is broken by this.
Change 584817 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Stop using SCRIPT_NAME where possible, rely on statically configured routing
Change 584817 merged by jenkins-bot:
[mediawiki/core@master] Stop using SCRIPT_NAME where possible, rely on statically configured routing