Page MenuHomePhabricator

Interwiki links should not be handled like local/static pages
Closed, DeclinedPublic

Description

Interwiki links are handled like local links and not like external links. Consequently, the HTML static pages belong dead local URLs.

What I do:
I run dumpHTML.php on the "Wikipedia" article (http://en.wikipedia.org/wiki/Wikipedia).

What I get:
The interwikis issue from the "Template:Wikipedia" (http://en.wikipedia.org/wiki/Template:Wikipedias), included in the "Wikipedia" article, are represented rewritten local URLs :


(...)
<tr style="">
<td class="navbox-group" style="">750,000+</td>
<td style="border-left: 2px solid rgb(253, 253, 253); padding: 0px; text-align: left; width: 100%;" class="navbox-list navbox-even">
<div style="padding: 0em 0.25em;"><span style="white-space: nowrap;"><a href="../../../../articles/g/e/r/German_Wikipedia_58de.html" title="German Wikipedia">German</a> <a href="../../../../../de/index.html" class="extiw" title="de:">de:</a></span></div>
</td>
</tr>

(...)

What I want:


(...)
</tr>
<tr style="">
<td class="navbox-group" style="">750,000+</td>
<td style="border-left: 2px solid rgb(253, 253, 253); padding: 0px; text-align: left; width: 100%;" class="navbox-list navbox-even">
<div style="padding: 0em 0.25em;"><span style="white-space: nowrap;"><a href="../../../../articles/g/e/r/German_Wikipedia_58de.html" title="German Wikipedia">German</a> <a href="http://de.wikipedia.org" class="extiw" title="de:">de:</a></span></div>
</td>
</tr>

(...)

The difference:

  • In the first case : <a href="../../../../../de/index.html" class="extiw" title="de:">de:</a>
  • In the second case : <a href="http://de.wikipedia.org" class="extiw" title="de:">de:</a>

Remark:
The issue comes IMO from the "GetFullURL" hook in dumpHTML.inc, removing it (and the useless onGetFullURL()) seems to resolve the issue.


Version: unspecified
Severity: enhancement

Details

Reference
bz16880

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:29 PM
bzimport set Reference to bz16880.
bzimport added a subscriber: Unknown Object (MLST).
Aklapper subscribed.

The DumpHTML has been unmaintained and broken for many years. It is being archived. Declining this task per T280185.