Page MenuHomePhabricator

Interwiki transclusion: set original Wikisource page as rel="canonical"
Open, Stalled, LowPublic

Description

  1. Look for an index page whose pagelist tag contains a language code (like <pagelist 36to533=it 566to576=it /> on https://en.wikisource.org/wiki/Index:The_Oxford_book_of_Italian_verse.djvu )
  2. Open a page which exists in the other language subdomain (e.g. https://en.wikisource.org/wiki/Page:The_Oxford_book_of_Italian_verse.djvu/114 )

I. Observed: the text is transcluded correctly (and the header says e.g. «This page does not need to be proofread. Its text comes from it.wikisource.org.»), but the page otherwise looks local.
II. Expected: for search engines purposes, the original page should be declared canonical, as file descriptions link the remote repository as canonical. <link rel="canonical" href="http://en.wikisource.org/wiki/Page:The_Oxford_book_of_Italian_verse.djvu/114" /> should be <link rel="canonical" href="http://it.wikisource.org/wiki/Pagina:The_Oxford_book_of_Italian_verse.djvu/114" />

The number of such pages indexed by Google seems limited: https://www.google.it/search?q=%22This+page+does+not+need+to+be+proofread.+Its+text+comes+from%22+site:wikisource.org&ie=utf-8&oe=utf-8&gws_rd=cr&ei=ct62VIkcwvlqlKCCqAY
It looks like the transcluded text is not indexed: https://webcache.googleusercontent.com/search?q=cache:MaIPZ183Tf4J:wikisource.org/wiki/Page:J%25C3%25B3zef_Gara_-_Zbi%25C3%25B3r_wierszy_o_wilamowskich_obrz%25C4%2599dach_i_obyczajach.pdf/11+&cd=1&hl=it&ct=clnk&gl=it

Event Timeline

Nemo_bis raised the priority of this task from to Low.
Nemo_bis updated the task description. (Show Details)
Nemo_bis added projects: ProofreadPage, Crosswiki.
Nemo_bis subscribed.

The transclusion between wikis (using templates like iwpage) is not managed by the ProofreadPage extension. So, it shouldn't be fixed by the extension, as long as the support of interwiki transclusion is not added to it (and I believe it won't as long as there are no clean ways to do such things).

I think it should be done in https://wikisource.org/wiki/MediaWiki:InterWikiTransclusion.js that is the current backend for {{iwpage}}

Tpt set Security to None.
Tpt added a subscriber: Phe.
Nemo_bis changed the task status from Open to Stalled.Jan 10 2015, 8:08 PM

Ah. It was too good to be true, I was indeed surprised by this feature. There isn't anything a script can do to solve this bug, so it will only be fixable when the feature is added to the extension. It would be good to have that tracked in the PP component, blocked on any necessary core task (T11890?).

Contents is transcluded through javascript, I'm unsure if google is able to index it. There is a lot of uncertainty about what google and other SE do with javascript. Some content created by js are indexed, some other aren't, some SE doesn't support at all any javascript. Anyway even on it.wikisource.org the following request doesn't return the right Page:

site:it.wikisource.org "misurando a passi tardi"

Are all Page: on it blocked from indexing through their robots.txt ?

As pointed by Tpt, this can't be solved without a clean transwiki transclusion and it'll remain the problem of the canonical <link.