Page MenuHomePhabricator

The interwiki maps might be wrong in VisualEditor
Closed, ResolvedPublic

Description

Event Timeline

LGoto triaged this task as Medium priority.Mar 6 2020, 5:08 PM

The reproduction details are now at,
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_179#Wikisophia_links?

The html that parsoid is being sent looks as follows (which seems ok),

...
<figure typeof="mw:Image/Thumb" class="mw-default-size" id="mwFw"><a href="./File:Example.jpg" id="mwGA"><img src="//upload.wikimedia.org/wikipedia/en/thumb/a/a9/Example.jpg/220px-Example.jpg" width="220" height="238" resource="./File:Example.jpg" alt="" data-file-width="275" data-file-height="297" data-file-type="bitmap" srcset="//upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg 2x, //upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg 1.5x" id="mwGQ"></a><figcaption id="mwGg"><p>Broken: <a href="https://en.wiktionary.org/wiki/Test" rel="mw:WikiLink/Interwiki">wiktionary</a> <a href="https://en.wikiquote.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikiquote</a> <a href="https://en.wikisource.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikisource</a> <a href="https://en.wikiversity.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikiversity</a> <a href="https://en.wikivoyage.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikivoyage</a> Working: <a href="./Test" rel="mw:WikiLink" class="mw-disambig" title="Test">wikipedia</a> <a href="https://en.wikinews.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikinews</a> <a href="https://en.wikibooks.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikibooks</a> <a href="https://species.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikispecies</a> <a href="https://www.wikidata.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikidata</a> <a href="https://foundation.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">foundation</a> <a href="https://commons.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">commons</a> <a href="https://meta.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">meta</a> <a href="https://www.mediawiki.org/wiki/Test" rel="mw:WikiLink/Interwiki">mediawikiwiki</a> <a href="https://phabricator.wikimedia.org/Test" rel="mw:WikiLink/Interwiki">phabricator</a></p></figcaption></figure>

Using the method described to run tools on scandium,
https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing#Running_Parsoid_tools_on_scandium

arlolra@scandium:/srv/parsoid-testing$ cat t.html | sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php /srv/parsoid-testing/bin/parse.php --wiki=enwiki --integrated --html2wt
[[File:Example.jpg|alt=|thumb|Broken: [[wikiwikiweb:Test|wiktionary]] [[wikipediawikipedia:Test|wikiquote]] [[wikisophia:Test|wikisource]] [[wikiti:Test|wikiversity]] [[wikiversity:Test|wikivoyage]] Working: [[Test|wikipedia]] [[wikinews:Test|wikinews]] [[wikibooks:Test|wikibooks]] [[species:Test|wikispecies]] [[wikidata:Test|wikidata]] [[foundation:Test|foundation]] [[c:Test|commons]] [[metawiki:Test|meta]] [[mw:Test|mediawikiwiki]] [[phab:Test|phabricator]]]]

where t.html is the above html.

I imagine the bug is in,
https://github.com/wikimedia/parsoid/blob/master/extension/src/Config/SiteConfig.php#L353-L394

and the root cause is in T231568

The reason for this needing to be in an image caption seems to be that VE sends back,

<!-- pasted outside an image caption -->
<a href="https://en.wiktionary.org/wiki/Test" rel="mw:WikiLink/Interwiki" title="wikt:Test" id="mwCA">wiktionary</a>
<a href="https://en.wikiquote.org/wiki/Test" rel="mw:WikiLink/Interwiki" title="q:Test" id="mwCQ">wikiquote</a>
<a href="https://en.wikisource.org/wiki/Test" rel="mw:WikiLink/Interwiki" title="s:Test" id="mwCg">wikisource</a>

<!-- pasted inside an image caption -->
<a href="https://en.wiktionary.org/wiki/Test" rel="mw:WikiLink/Interwiki">wiktionary</a>
<a href="https://en.wikiquote.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikiquote</a>
<a href="https://en.wikisource.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikisource</a>

so the data-parsoid for the associated ids gets reused?

The problem stems from,

			// Fix up broken interwiki hrefs that are missing a $1 placeholder
			// Just append the placeholder at the end.
			// This makes sure that the interWikiMatcher below adds one match
			// group per URI, and that interwiki links work as expected.
			interwiki.url += '$1';

https://github.com/wikimedia/parsoid/blob/master/lib/config/WikiConfig.js#L237-L241

This was ported as,
https://github.com/wikimedia/parsoid/blob/master/src/Utils/ConfigUtils.php#L35

but not applied to the extension SiteConfig.php

The lack of $1 results in a missing capture group (.*?) which messes up keying in the interwiki matcher.

Change 585015 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Add fix for interwiki hrefs missing a $1 placeholder to extension/

https://gerrit.wikimedia.org/r/585015

Change 585015 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add fix for interwiki hrefs missing a $1 placeholder to extension/

https://gerrit.wikimedia.org/r/585015

Change 585888 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a9

https://gerrit.wikimedia.org/r/585888

Change 585888 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a9

https://gerrit.wikimedia.org/r/585888