API returns 'no-external-page' for existing pages with titles that look like a domain name (e.g. "ubermorgen.com" or "delocator.net")
Open, Needs TriagePublic

Description

This happens when manually trying to link from a Wikibase item to a wiki page:

  1. click 'edit' on a site link
  2. enter site, in my case 'en'
  3. on 'page', enter for example 'delocator.net'
  4. the suggester shows the existing page on the wiki, with its correct title "Delocator.net"
  5. click on the suggested title
  6. click 'save'
  7. reply from wiki api: "The specified article could not be found on the corresponding site."

POST request log:

REQUEST:

POST https://catalog.rhizome.org/api.php
Host: catalog.rhizome.org
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: https://catalog.rhizome.org/w/Item:Q1209
Content-Length: 173
Cookie: __qca=P0-1263044144-1401371737783; _ga=GA1.2.305842596.1401371738; wikiUserID=1; wikiUserName=Dragan+Espenschied; wikiToken=1591938d54cc4b3ee6b584497e728e81; __gads=ID=1f58b5fe4b3cb56e:T=1423877704:S=ALNI_MaRjXz_nRNjCTj8XrWrQ-OLelSUJA; wiki_session=98p2gd8b7i2ec49vmluga91qj3
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

action=wbsetsitelink&format=json&id=Q1209&linksite=RhizomeCatalog&linktitle=Delocator%2Enet&baserevid=8660&badges=&bot=1&token=6e052b1011b386bd7e6517a3c1fc0cb054f612af%2B%5C

RESPONSE (only body):

{
    "error" : {
        "code" : "no-external-page",
        "info" : "The external client site 'RhizomeCatalog' did not provide page information for page 'Delocator.net'."
    },
    "messages" : {
        "*" : "See https://catalog.rhizome.org/api.php for API usage"
    }
}

This happens with any wiki page with a title that looks like a domain name, for example ubermorgen.com, rhizome.org, and so forth. It is no problem to create such wiki pages or label items with such a scheme.

Pywikibot is also affected, when trying to automatically set site links.

despens created this task.Mar 3 2015, 8:14 PM
despens updated the task description. (Show Details)
despens raised the priority of this task from to Needs Triage.
despens added a subscriber: despens.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 3 2015, 8:14 PM

BTW, the proof that 'Delocator.net' exists as a wiki page:

GET REQUEST:

https://catalog.rhizome.org/api.php?action=query&prop=info&redirects=true&converttitles=true&titles=Delocator.net

RESPONSE:

{
    "warnings": {
        "main": {
            "*": "Unrecognized parameter: '*'"
        },
        "query": {
            "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
        }
    },
    "query": {
        "pages": {
            "5558": {
                "pageid": 5558,
                "ns": 0,
                "title": "Delocator.net",
                "contentmodel": "wikitext",
                "pagelanguage": "en",
                "touched": "2015-03-03T19:54:33Z",
                "lastrevid": 8675,
                "length": 13,
                "new": ""
            }
        }
    }
}
XZise set Security to None.
Restricted Application added a subscriber: Unknown Object (MLST). · View Herald TranscriptMar 3 2015, 11:43 PM
despens added a comment.EditedMar 3 2015, 11:54 PM

Apparently, normalizePageName in MediaWikiSite doesn't follow redirects. MediaWikiSite writes this error log:

2015-03-03 23:42:47 catalog wiki: call to <//catalog.rhizome.org/api.php?action=query&prop=info&redirects=1&converttitles=1&format=json&titles=Delocator.net> returned bad json: <html>
<head>
<title>Security redirect</title>
</head>
<body>
<h1>Security redirect</h1>
<p>
We can't serve non-HTML content from the URL you have requested, because
Internet Explorer would interpret it as an incorrect and potentially dangerous
content type.</p>
<p>Instead, please use <a href="https://catalog.rhizome.org/api.php?action=query&amp;prop=info&amp;redirects=1&amp;converttitles=1&amp;format=json&amp;titles=Delocator.net&amp;*">this URL</a>, which is the same as the
URL you have requested, except that "&amp;*" is appended. This prevents Internet
Explorer from seeing a bogus file extension.
</p>
</body>
</html>

When appending &amp;* to the original API request in includes/site/MediaWikiSite.php:136, like this:

$url = wfAppendQuery( $this->getFileUrl( 'api.php' ), $args ). '&amp;*';

the request works and normalizePageName returns a valid page name.