Page MenuHomePhabricator

MW REST API, wiki titles, Apache2 and URI decoding fun
Closed, InvalidPublic

Description

Apache2 decodes incoming URI paths before sending them to PHP. This worked well in a world where 99.9% of requests would get rewritten into index.php?... form, but having MW sit behind it now that the REST API is on the horizon poses a problem, since REST API paths get matched using those same forward slashes. For example, the URI /w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User%3APchelolo%2FOnDemand_Test/275844 gets passed to PHP as /w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User:Pchelolo/OnDemand_Test/275844 which then cannot be matched by MW's REST router since the route definition is /w/rest.php/{domain}/v3/page/{format}/{title}/{revision}. Therefore, for the needs of the REST API, all of the path parameters that could contain slashes should be encoded.

Since the decoding is still needed for all non-REST routes, the question is: can we have the decoding not done only for them? If not, what are some alternatives that can be explored?

Originally reported as T235375: Parsoid/PHP responds with 404 for titles with slashes and revision, see there for more (detailed) info.

Event Timeline

According to Apache2 documentation, setting [AllowEncodedSlashes NoDecode](https://httpd.apache.org/docs/2.4/mod/core.html#allowencodedslashes) in the config should solve the problem, but when I tried it in Beta it didn't seem to work. Either way, even if that setting does work, the question is can its effect be limited to only a specific route prefix, i.e. /w/rest.php.

Taking a quick dump of the $_SERVER variable shows the content of REQUEST_URI are - as expected - unencoded.

Not sure what MediaWiki uses - probably SCRIPT_URL ?

 ["PATH_TRANSLATED"]=>
 string(120) "/srv/mediawiki/docroot/wikipedia.org/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User:Pchelolo/OnDemand_Test/275844"
 ["PATH_INFO"]=>
 string(84) "/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User:Pchelolo/OnDemand_Test/275844"
 ["SCRIPT_NAME"]=>
 string(11) "/w/rest.php"
 ["REQUEST_URI"]=>
 string(99) "/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User%3APchelolo%2FOnDemand_Test/275844"
 ["SCRIPT_FILENAME"]=>
 string(47) "/srv/mediawiki/docroot/wikipedia.org/w/rest.php"
 ["SCRIPT_URI"]=>
 string(131) "http://en.wikipedia.beta.wmflabs.org/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User:Pchelolo/OnDemand_Test/275844"
 ["SCRIPT_URL"]=>
 string(95) "/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User:Pchelolo/OnDemand_Test/275844"
["PHP_SELF"]=>
 string(95) "/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User:Pchelolo/OnDemand_Test/275844"

No, MediaWiki [prefers REQUEST_URI](https://github.com/wikimedia/mediawiki/blob/a03feb8ab05c29ecdf9905dd950986d82ec1590e/includes/WebRequest.php#L863). But your output does not coincide with mine. I altered the code in-place on parsoid10 yesterday and sent back via a header the contents of REQUEST_URI and got back the decoded version. Also, the fact that MW's router 404s on that route confirms that MW sees the decoded version.

If I set AllowEncodedSlashes to NoDecode it works as intended, but:

  • It must be set at the vhost level, which is a breaking change
  • We have inconsistent settings across different wikis (and thus we have a encoded_slashes variable in our mediawiki::web::vhost definition in puppet)

After further examination, given MediaWiki indeed uses REQUEST_URI, and given that is unencoded in all cases, I don't think the issue you were experiencing has anything to do with the apache configuration, but rather with something internal to MediaWiki.

Even with AllowEncodedSlashes set to On it works as intended.

One thing we should do, though, is to set AllowEncodedSlashes on all wikis or the rest API won't work as intended I guess.

Joe closed this task as Invalid.EditedOct 15 2019, 7:20 AM

Just confirmed the problem is at the varnish layer, not at the mediawiki one. I modified rest.php to print out the value of REQUEST_URI and it shows:

# VARNISH
$ curl -H 'X-Forwarded-Proto: https' -H'Host: en.wikipedia.beta.wmflabs.org' http://deployment-cache-text05/w/rest.php/en.wikipedia.beta.wmflabs.orv3/page/pagebundle/User%3APchelolo%2FOnDemand_Test/275844 -v
...
X-WTF: /w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User:Pchelolo/OnDemand_Test/275844

# APACHE
$ curl -H 'X-Forwarded-Proto: https' -H'Host: en.wikipedia.beta.wmflabs.org' http://deployment-mediawiki-parsoid10/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User%3APchelolo%2FOnDemand_Test/275844 -v
...
X-WTF: /w/rest.php/en.wikipedia.beta.wmflabs.org/v3/page/pagebundle/User%3APchelolo%2FOnDemand_Test/275844

So the problem is at the varnish layer. I'll close this task as invalid so you can open a new one for the traffic team.

Thank you @Joe for taking the time to investigate. I created T235478: Varnish/ATS should not decode URIs for /w/rest.php to follow up with the traffic folks.