Page MenuHomePhabricator

API: allredirects with arunique doesn't properly handle interwiki redirects
Open, Needs TriagePublic

Description

For example, consider this redirect on wiki.archlinux.org, from title ABS FAQ (日本語) to ja:ABS FAQ. This gets returned appropriately from a non-unique query, but if you add arunique you lose any indication that it's interwiki and in generator mode it generates a page on the local wiki instead of the correct interwiki title.

Event Timeline

Lahwaacz raised the priority of this task from to Needs Triage.
Lahwaacz updated the task description. (Show Details)
Lahwaacz subscribed.
Lahwaacz set Security to None.

Further investigation shows that the "ABS FAQ" page in fact redirects to another page, i.e. it is a redirect source. There are many more similar entries in the query.

This turns out to be irrelevant. The reason "ABS FAQ" is being included in the list is because arunique is broken with respect to interwiki redirects: the page ABS FAQ (日本語) is a redirect to ja:ABS FAQ, but arunique is throwing away the rd_interwiki field, and as a generator it therefore incorrectly generates the local title rather than the correct interwiki title.

Another incorrect entry a valid target redirect page being marked as "missing" (whatever that means)

It means that the page doesn't exist on the wiki. It's entirely valid for generator=allredirects to return a missing title; Special:BrokenRedirects even exists to list such pages. prop=redirects works fine on these missing titles.

In this particular case, though, it's because redirects such as [[AboutWiki (日本語)]] → [[ja:ArchWiki:About]] are being incorrectly returned as above.

Anomie renamed this task from API: generator=allredirects yields incorrect results for a specific wiki to API: allredirects with arunique doesn't properly handle interwiki redirects.Jul 9 2015, 2:41 PM
Anomie updated the task description. (Show Details)
Anomie moved this task from Unsorted to Needs Code on the MediaWiki-Action-API board.
Anomie unsubscribed.

It looks like this'll probably take more work than I thought: without an index on (rd_namespace,rd_title,rd_interwiki), the database queries here can badly (including rd_interwiki in the SELECT sucks no matter what, and trying to include only non-interwiki via WHERE is vulnerable to the T97797 issue). So we'll probably have to add the index, or else figure out better queries.