Page MenuHomePhabricator

Transwiki Special:Import via redirected interwiki URL spews PHP error about HTTP error
Closed, ResolvedPublic

Description

When testing a transwiki/interwiki import from Wikipedia to my local test wiki, I found i was getting this PHP error spewed out:

<b>Warning</b>: file_get_contents(http://www.wikipedia.org/wiki/Special:Export/About?history=1) [<a href='function.file-get-contents'>function.file-get-contents</a>]: failed to open stream: HTTP request failed! HTTP/1.0 411 Length Required
in <b>/opt/web/pages/trunk/includes/HttpFunctions.php</b> on line <b>83</b><br />

My local interwiki table was using www.wikipedia.org URLs, which return an HTTP 301 redirect. Note that the local wiki doesn't have CURL set up and is using the file_get_contents() path.

A POST request is used for the import, I'm not 100% sure why. Unfortunately at least file_get_contents() doesn't seem to handle redirects properly. It follows the target URL, doing another POST, but apparently without all the headers. The Content-Length header, required for POSTs, is being lost, so the target server returns an HTTP server on the subrequest.

Proper behavior would be to use a GET in the first place, or to resubmit at the new URL with the correct headers. HTTP 1.1 spec says a user-agent MUST NOT follow through a POST across a redirect without user confirmation; for this case the system's knowledge of wikis should be enough if we actually do it right though. ;)

I'm not sure what happens in the CURL case, haven't tested.


Version: 1.12.x
Severity: enhancement

Details

Reference
bz11378

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:55 PM
bzimport set Reference to bz11378.
bzimport added a subscriber: Unknown Object (MLST).

Mass compoment change: <some> -> Export/Import

TTO subscribed.

This is a very old bug (8 years old). I'm finding it difficult to know how to set this up to reproduce the issue.

Having forced the use of the PHP HTTP engine (Http::$httpEngine = 'php'), I get the following errors in the debug output:

ImportStreamSource::newFromURL: opening https://www.wikipedia.org/wiki/Special:Export/Cassie
HTTP: POST: https://www.wikipedia.org/wiki/Special:Export/Cassie
[http] PhpHttpRequest: error opening connection: {errstr1}
[http] HTTP request failed due to unknown error.

{errstr1} is such a wonderfully useful error message, isn't it?

In e59f873bef67ef8531e2d19124b8019feca3b307 (committed a few months before this bug was filed), River Tarnell said that Special:Import should use POST for interwiki fetches, in order to avoid redirects. It's not clear why redirects should be avoided in this case.

I'm tempted to just make transwiki imports use GET - what possible reason could there be for using POST?

Filip claimed this task.
Filip subscribed.

@TTO: No longer occur in MediaWiki 1.21. Tried on 3 articles.