Page MenuHomePhabricator

Implement workarounds in RESTBase and Flow to hit Parsoid/PHP REST API endpoints without an oldid for titles containing "."
Closed, DeclinedPublic

Description

See T232556: Parsoid/PHP roundtrip testing: Period ( . ) in titles fail with a http 403 (Invalid file extension found in the path info or query string.) error.

ssastry@scandium:~$ curl -L -x scandium.eqiad.wmnet:80 "http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/G.729d"
<!DOCTYPE html>
<html><head><title>Forbidden</title></head>
<body><h1>Forbidden</h1><p>Invalid file extension found in the path info or query string.</p></body></html>

ssastry@scandium:~$ time curl -L -x scandium.eqiad.wmnet:80 "http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/G.729d/"
<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: http:////en.wikipedia.org/wiki/Special:Redirect/"><meta charset="utf-8"/><meta property="mw:html:version" content="2.1.0"/><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/G.729d"/><title>G.729d</title><base href="//en.wikipedia.org/wiki/"/><link rel="stylesheet" href="/w/load.php?modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Csite.styles%7Cext.cite.style%7Cext.cite.styles%7Cmediawiki.page.gallery.styles&amp;only=styles&amp;skin=vector"/><!--[if lt IE 9]><script src="/w/load.php?modules=html5shiv&amp;only=scripts&amp;skin=vector&amp;sync=1"></script><script>html5.addElements('figure-inline');</script><![endif]--></head><body data-parsoid='{"dsr":[0,19,0,0]}' class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><section data-mw-section-id="0" data-parsoid="{}"><link rel="mw:PageProp/redirect" href="./G.729" data-parsoid='{"src":"#REDIRECT ","a":{"href":"./G.729"},"sa":{"href":"G.729"},"dsr":[0,19,null,null]}'/></section></body></html>

So, if RESTBase and Flow need to make any request to fetch Parsoid HTML for a title without the oldid, and the title contains ., you will have to use a workaround by appending a / to the request so it doesn't trigger the IE6/IE7 protection in MediaWiki core (which is slated to be dropped as per the approved RFC T232563). Given the Parsoid/PHP deployment timeline, it makes sense for this workaround to be implemented in RESTBase & Flow since it seems unlikely that the core code triggering the 403 will be dropped before the deployment.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ssastry renamed this task from Implement workarounds in RESTBase and Flow while accessing theHTMLwithout a oldid for Parsoid/PHP REST API for titles containing "." to Implement workarounds in RESTBase and Flow to hit Parsoid/PHP REST API endpoints without an oldid for titles containing ".".Oct 10 2019, 3:05 PM
ssastry triaged this task as High priority.

To be clear (and I'll edit this if I turn out to be wrong), this is a *future* not *current* problem. It will break when we switch to Parsoid/PHP from Parsoid/JS.

VE currently calls back via the api.php endpoint (action=visualeditor, paction=parsefragment or =parsedoc), and that internally calls the "virtual rest" URL (no actual network traffic involved) which then dispatches to RESTbase and from there to Parsoid/JS running its own node server.

In the future, RESTbase will be calling back into the PHP MediaWiki cluster via the rest.php endpoint to invoke Parsoid/PHP, and that's the path which has troubles with dots.

So someone needs to ensure that the potentially problematic paths have slashes at the end. That can be done now, since it shouldn't break anything with Parsoid/JS currently (worth double-checking!).

One Q is what amount of processing is done by RESTbase -- is it sufficient for VE to add the trailing slash, and will RESTbase always then pass along the trailing slash to Parsoid? (I'm not certain!)

It seems like it would make more sense for RESTbase to have responsibility for protecting the paths before it does its recursive request to Parsoid. That adds some annoying service-specific code to the RESTBase dispatcher -- but at this point that request-munging should happen before *any* dispatch to a REST service hosted within MediaWiki, not just Parsoid.

ssastry updated the task description. (Show Details)

subbu: would percent-encoding the . as %2E help, or is this IE6/7 hack done after percent-decoding?

(Answering my own question:)

$ curl -x scandium.eqiad.wmnet:80 "http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/G%2E729d"
<!DOCTYPE html>
<html>
<head>
<title>Security redirect</title>
</head>
<body>
<h1>Security redirect</h1>
<p>
We can't serve non-HTML content from the URL you have requested, because
Internet Explorer would interpret it as an incorrect and potentially dangerous
content type.</p>
<p>Instead, please use <a href="http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/G%2E729d?&amp;*">this URL</a>, which is the same as the
URL you have requested, except that "&amp;*" is appended. This prevents Internet
Explorer from seeing a bogus file extension.
</p>
</body>
</html>

And note that adding ?&* to the query doesn't actually solve the issue:

$ curl -x scandium.eqiad.wmnet:80 "http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/G%2E729d?&*"
<!DOCTYPE html>
<html>
<head>
<title>Security redirect</title>
</head>
<body>
<h1>Security redirect</h1>
<p>
We can't serve non-HTML content from the URL you have requested, because
Internet Explorer would interpret it as an incorrect and potentially dangerous
content type.</p>
<p>Instead, please use <a href="http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/G%2E729d?&amp;*&amp;*">this URL</a>, which is the same as the
URL you have requested, except that "&amp;*" is appended. This prevents Internet
Explorer from seeing a bogus file extension.
</p>
</body>
</html>

ad infinitum.

From what I can tell, RESTBase never asks for a page without revision, so it should be fine there.

Not needed anymore after the fixes in T239666#5707977 and following.

So this should have been a deployment blocker :/