Page MenuHomePhabricator

Wikisource Export: References sometimes external link
Closed, ResolvedPublic3 Estimated Story PointsBUG REPORT

Description

As a Wikisource user, I want the references bug fixed, so that I can see internal links within the book (rather than needing to leave the book & have internet access).

Background: References in a few ebooks are an external link, taking you to the wikisource website, rather than an internal link, taking you to the footnote in the ebook. So far, it has only happened for ebooks where the reference is in the article itself, not transcluded in a <pages> element. These are pretty rare though, so perhaps it is a coincidence. This happens in wsexport-test, but not production. This is potentially due to our work in T264788.

Examples:

Acceptance Criteria:

  • Restore previous behavior, so that footnotes links are internal (rather than external) in ebook exports

Event Timeline

ARamirez_WMF set the point value for this task to 3.Jan 12 2021, 11:54 PM
ARamirez_WMF moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.

Same behaviour was observed in fr:Wikisource with links pointing to the same page (first char=#)
example : [ https://wsexport.wmflabs.org/?lang=fr&page=Les_p%C3%A8res_du_syst%C3%A8me_tao%C3%AFste/Tao-Tei-King&format=epub-3&fonts=]
The links in the summary will take you back to Ws website.

PR: https://github.com/wsexport/tool/pull/322


Same behaviour was observed in fr:Wikisource with links pointing to the same page (first char=#)
example : [ https://wsexport.wmflabs.org/?lang=fr&page=Les_p%C3%A8res_du_syst%C3%A8me_tao%C3%AFste/Tao-Tei-King&format=epub-3&fonts=]
The links in the summary will take you back to Ws website.

@Denis_Gagne52 the above patch also fixes the issue you are describing

I still find external links on some ebooks.

https://wsexport-test.wmflabs.org/?lang=en&page=The_Variation_of_Animals_and_Plants_under_Domestication%2FXIV&format=epub-3&fonts=:

<sup id="ref_136" class="plainlinks" about="#mwt5" typeof="mw:Transclusion"><a rel="mw:WikiLink" href="https://en.wikisource.org/wiki/The_Variation_of_Animals_and_Plants_under_Domestication/XIV#endnote_136">[136]</a></sup>

https://wsexport-test.wmflabs.org/?lang=fr&page=Les_p%C3%A8res_du_syst%C3%A8me_tao%C3%AFste/Tao-Tei-King&format=epub-3&fonts=:

<a rel="mw:WikiLink" href="c0_Les_peres_du_systeme_taoiste_Tao_Tei_King.xhtml#CH03" title="Les pères du système taoïste/Tao-Tei-King" id="mwEQ">3 <span typeof="mw:Entity" id="mwEg"> </span> —</a> <span typeof="mw:Entity" id="mwEw"> </span>
<a rel="mw:WikiLink" href="https://fr.wikisource.org/wiki/Les_pères_du_système_taoïste/Tao-Tei-King#CH04" id="mwFA">4 <span typeof="mw:Entity" id="mwFQ"> </span> —</a> <span typeof="mw:Entity" id="mwFg"> </span>
<a rel="mw:WikiLink" href="https://fr.wikisource.org/wiki/Les_pères_du_système_taoïste/Tao-Tei-King#CH05" id="mwFw">5 <span typeof="mw:Entity" id="mwGA"> </span> —</a> <span typeof="mw:Entity" id="mwGQ"> </span>

(Note that some of the links for the above ebook are fixed, but others aren't.)

@Samwilson @dmaza How many of these should we expect to be fixed?

All <ref> reference links should be fixed, but there are some works that use manual links to create footnotes and these have not been fixed yet. We don't have a way to identify these, because they're effectively just normal wiki links (as you've pasted above). They used be be identifiable by the fact that they had same-document links, e.g. href="#foo" but Parsoid adds the page name to these, href="Lorem#foo" — we'll add handling for these soon (perhaps in a separate ticket? I'm not sure).

All <ref> reference links should be fixed, but there are some works that use manual links to create footnotes and these have not been fixed yet. We don't have a way to identify these, because they're effectively just normal wiki links (as you've pasted above). They used be be identifiable by the fact that they had same-document links, e.g. href="#foo" but Parsoid adds the page name to these, href="Lorem#foo" — we'll add handling for these soon (perhaps in a separate ticket? I'm not sure).

Ah, ok, thanks, I think I understand now.

References that use the conventional <ref> markup are now internal links, e.g.:

Books which use the link markup ([[link]]) are a bit harder to predict, and it seems to depend on what the exact link used is. E.g.:

Test environment WS Export production version 2.3.2.

ifried subscribed.

This is now on production. As detailed in the discussion above, this work fixed the issue of external footnotes links appearing (when the links should be internal) in Wikisource ebook exports. However, in works that have manual links to create footnotes, the issue remains. This is because we don't have an easy way to identify such footnote links. Since this would require extra work, and this work would likely be larger in scope, we are not prioritizing it for now, but it can potentially be revisited by our team, another team, or a volunteer developer at another time. For this specific ticket, I am marking it as Done, as it fixes many of the previous cases and any further work would need to be handled in a separate ticket.

This was working before and was broken during this project. I don’t understand the difficulty to identify such links. As mentioned earlier, any link built this way [[Les pères du système taoïste/Tao-Tei-King#CHAP1]] will work but not with only #CHAP1 inside the brackets. There are many links built that way. I think the conversion was taken care in this function of BookCleanerEpub.php but the first part does not trap #mylink any more :

/**

  • change the internal links
	 */

protected function setLinks( DOMDocument $dom ) {

		$list = $dom->getElementsByTagName( 'a' );
		/** @var DOMElement $node */
		foreach ( $list as $node ) {
			$href = $node->getAttribute( 'href' );
			$title = Util::encodeString( $node->getAttribute( 'title' ) ) . '.xhtml';

>> if ( substr( $href, 0, 1 ) === '#' ) {

@Denis_Gagne52: Thanks for providing this information! I have added it to a new ticket (T275632), which specifically deals with this issue. Feel free to add in any more comments or information into that ticket.