Page MenuHomePhabricator

API returns 'no-external-page' for existing pages with titles that look like a domain name (e.g. "ubermorgen.com" or "delocator.net")
Open, HighPublic

Description

This happens when manually trying to link from a Wikibase item to a wiki page:

  1. click 'edit' on a site link
  2. enter site, in my case 'en'
  3. on 'page', enter for example 'delocator.net'
  4. the suggester shows the existing page on the wiki, with its correct title "Delocator.net"
  5. click on the suggested title
  6. click 'save'
  7. reply from wiki api: "The specified article could not be found on the corresponding site."

POST request log:

REQUEST:

POST https://catalog.rhizome.org/api.php
Host: catalog.rhizome.org
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: https://catalog.rhizome.org/w/Item:Q1209
Content-Length: 173
Cookie: __qca=P0-1263044144-1401371737783; _ga=GA1.2.305842596.1401371738; wikiUserID=1; wikiUserName=Dragan+Espenschied; wikiToken=1591938d54cc4b3ee6b584497e728e81; __gads=ID=1f58b5fe4b3cb56e:T=1423877704:S=ALNI_MaRjXz_nRNjCTj8XrWrQ-OLelSUJA; wiki_session=98p2gd8b7i2ec49vmluga91qj3
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

action=wbsetsitelink&format=json&id=Q1209&linksite=RhizomeCatalog&linktitle=Delocator%2Enet&baserevid=8660&badges=&bot=1&token=6e052b1011b386bd7e6517a3c1fc0cb054f612af%2B%5C

RESPONSE (only body):

{
    "error" : {
        "code" : "no-external-page",
        "info" : "The external client site 'RhizomeCatalog' did not provide page information for page 'Delocator.net'."
    },
    "messages" : {
        "*" : "See https://catalog.rhizome.org/api.php for API usage"
    }
}

This happens with any wiki page with a title that looks like a domain name, for example ubermorgen.com, rhizome.org, and so forth. It is no problem to create such wiki pages or label items with such a scheme.

Pywikibot is also affected, when trying to automatically set site links.

Event Timeline

despens raised the priority of this task from to Needs Triage.
despens updated the task description. (Show Details)
despens subscribed.

BTW, the proof that 'Delocator.net' exists as a wiki page:

GET REQUEST:

https://catalog.rhizome.org/api.php?action=query&prop=info&redirects=true&converttitles=true&titles=Delocator.net

RESPONSE:

{
    "warnings": {
        "main": {
            "*": "Unrecognized parameter: '*'"
        },
        "query": {
            "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
        }
    },
    "query": {
        "pages": {
            "5558": {
                "pageid": 5558,
                "ns": 0,
                "title": "Delocator.net",
                "contentmodel": "wikitext",
                "pagelanguage": "en",
                "touched": "2015-03-03T19:54:33Z",
                "lastrevid": 8675,
                "length": 13,
                "new": ""
            }
        }
    }
}
Restricted Application added a subscriber: Unknown Object (MLST). · View Herald TranscriptMar 3 2015, 11:43 PM

Apparently, normalizePageName in MediaWikiSite doesn't follow redirects. MediaWikiSite writes this error log:

2015-03-03 23:42:47 catalog wiki: call to <//catalog.rhizome.org/api.php?action=query&prop=info&redirects=1&converttitles=1&format=json&titles=Delocator.net> returned bad json: <html>
<head>
<title>Security redirect</title>
</head>
<body>
<h1>Security redirect</h1>
<p>
We can't serve non-HTML content from the URL you have requested, because
Internet Explorer would interpret it as an incorrect and potentially dangerous
content type.</p>
<p>Instead, please use <a href="https://catalog.rhizome.org/api.php?action=query&amp;prop=info&amp;redirects=1&amp;converttitles=1&amp;format=json&amp;titles=Delocator.net&amp;*">this URL</a>, which is the same as the
URL you have requested, except that "&amp;*" is appended. This prevents Internet
Explorer from seeing a bogus file extension.
</p>
</body>
</html>

When appending &amp;* to the original API request in includes/site/MediaWikiSite.php:136, like this:

$url = wfAppendQuery( $this->getFileUrl( 'api.php' ), $args ). '&amp;*';

the request works and normalizePageName returns a valid page name.

Xqt subscribed.

Is this still valid? I cannot reproduce it.

Hi All

I was facing the same error with my Wikibase installation, and I thought that the page really did not exist, but in the end, after a couple of days of troubleshooting this, I found that the Wikibase Repo, which was deployed in a server (1), does not have direct access to the Wikibase Client which was deployed in a server (2).

How did I check the connectivity? Only by using telnet.

For example, from server (1): telnet 10.10.10.10 25

So what is happening is like the following:

  1. When you try to add a new site link, internally it checked if the page is correct by calling this API: /api.php?action=query&prop=info&redirects=true&converttitles=true&titles=Delocator.net
  1. The above API should give some basic info like the page name after resolving, and namespace.
  1. In my case, because there was no direct access to the Wikibase client, the API will wait for a response, and if it doesn't receive a response within 5 seconds (The Timeout Limit), it will consider that the page does not exist.

I think that the error which is currently given is misleading, and it causes confusion for the user. I think it would be better if we can enhance the error handling to take this case into consideration.

Restricted Application added a subscriber: alaa. · View Herald TranscriptAug 4 2023, 6:55 PM
Xqt triaged this task as High priority.Aug 5 2023, 11:19 AM

Don't see that this is related to Pywikibot