The interwiki prefix changes in https://en.wikipedia.org/w/index.php?title=Ecclesiology&diff=943075359&oldid=937425525 shouldn't have happened.
Description
Details
Related Objects
Event Timeline
@PrimeHunter posted details for reproduction at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Wikisophia_links?
The reproduction details are now at,
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_179#Wikisophia_links?
The html that parsoid is being sent looks as follows (which seems ok),
... <figure typeof="mw:Image/Thumb" class="mw-default-size" id="mwFw"><a href="./File:Example.jpg" id="mwGA"><img src="//upload.wikimedia.org/wikipedia/en/thumb/a/a9/Example.jpg/220px-Example.jpg" width="220" height="238" resource="./File:Example.jpg" alt="" data-file-width="275" data-file-height="297" data-file-type="bitmap" srcset="//upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg 2x, //upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg 1.5x" id="mwGQ"></a><figcaption id="mwGg"><p>Broken: <a href="https://en.wiktionary.org/wiki/Test" rel="mw:WikiLink/Interwiki">wiktionary</a> <a href="https://en.wikiquote.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikiquote</a> <a href="https://en.wikisource.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikisource</a> <a href="https://en.wikiversity.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikiversity</a> <a href="https://en.wikivoyage.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikivoyage</a> Working: <a href="./Test" rel="mw:WikiLink" class="mw-disambig" title="Test">wikipedia</a> <a href="https://en.wikinews.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikinews</a> <a href="https://en.wikibooks.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikibooks</a> <a href="https://species.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikispecies</a> <a href="https://www.wikidata.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikidata</a> <a href="https://foundation.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">foundation</a> <a href="https://commons.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">commons</a> <a href="https://meta.wikimedia.org/wiki/Test" rel="mw:WikiLink/Interwiki">meta</a> <a href="https://www.mediawiki.org/wiki/Test" rel="mw:WikiLink/Interwiki">mediawikiwiki</a> <a href="https://phabricator.wikimedia.org/Test" rel="mw:WikiLink/Interwiki">phabricator</a></p></figcaption></figure>
Using the method described to run tools on scandium,
https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing#Running_Parsoid_tools_on_scandium
arlolra@scandium:/srv/parsoid-testing$ cat t.html | sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php /srv/parsoid-testing/bin/parse.php --wiki=enwiki --integrated --html2wt [[File:Example.jpg|alt=|thumb|Broken: [[wikiwikiweb:Test|wiktionary]] [[wikipediawikipedia:Test|wikiquote]] [[wikisophia:Test|wikisource]] [[wikiti:Test|wikiversity]] [[wikiversity:Test|wikivoyage]] Working: [[Test|wikipedia]] [[wikinews:Test|wikinews]] [[wikibooks:Test|wikibooks]] [[species:Test|wikispecies]] [[wikidata:Test|wikidata]] [[foundation:Test|foundation]] [[c:Test|commons]] [[metawiki:Test|meta]] [[mw:Test|mediawikiwiki]] [[phab:Test|phabricator]]]]
where t.html is the above html.
I imagine the bug is in,
https://github.com/wikimedia/parsoid/blob/master/extension/src/Config/SiteConfig.php#L353-L394
and the root cause is in T231568
The reason for this needing to be in an image caption seems to be that VE sends back,
<!-- pasted outside an image caption --> <a href="https://en.wiktionary.org/wiki/Test" rel="mw:WikiLink/Interwiki" title="wikt:Test" id="mwCA">wiktionary</a> <a href="https://en.wikiquote.org/wiki/Test" rel="mw:WikiLink/Interwiki" title="q:Test" id="mwCQ">wikiquote</a> <a href="https://en.wikisource.org/wiki/Test" rel="mw:WikiLink/Interwiki" title="s:Test" id="mwCg">wikisource</a> <!-- pasted inside an image caption --> <a href="https://en.wiktionary.org/wiki/Test" rel="mw:WikiLink/Interwiki">wiktionary</a> <a href="https://en.wikiquote.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikiquote</a> <a href="https://en.wikisource.org/wiki/Test" rel="mw:WikiLink/Interwiki">wikisource</a>
so the data-parsoid for the associated ids gets reused?
The problem stems from,
// Fix up broken interwiki hrefs that are missing a $1 placeholder // Just append the placeholder at the end. // This makes sure that the interWikiMatcher below adds one match // group per URI, and that interwiki links work as expected. interwiki.url += '$1';
https://github.com/wikimedia/parsoid/blob/master/lib/config/WikiConfig.js#L237-L241
This was ported as,
https://github.com/wikimedia/parsoid/blob/master/src/Utils/ConfigUtils.php#L35
but not applied to the extension SiteConfig.php
The lack of $1 results in a missing capture group (.*?) which messes up keying in the interwiki matcher.
Change 585015 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Add fix for interwiki hrefs missing a $1 placeholder to extension/
Change 585015 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add fix for interwiki hrefs missing a $1 placeholder to extension/
Change 585888 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a9
Change 585888 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a9