Page MenuHomePhabricator

Links-in-link scenarios break template about-grouping in some scenaros.
Open, MediumPublic

Description

It seems that sometimes VE multiplies ISBN (in incorrect format) even with unrelated edits. I've seen this problem on several articles, but fr:Objection de conscience seems a good example, with multiple addition of an incorrectly formatted ISBN :

<small style="line-height:1em;">[[International Standard Book Number|ISBN]]&nbsp;[[Spécial:Ouvrages_de_référence/9782251694238|<span class="nowrap">9782251694238</span>]])</small>
  • Last edit with no other modification in the same paragraph : added once
  • Previous edit with no other modification in the same paragraph : added once
  • Previous edit with other modification in the same paragraph : added twice

Could you fix this as it damages articles ?

Event Timeline

NicoV created this task.Sep 3 2020, 12:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2020, 12:15 PM
matmarex renamed this task from VE mutliplies ISBN to VE mutliplies ISBN template.Sep 3 2020, 4:07 PM
matmarex added a project: Parsoid.
ssastry added a subscriber: ssastry.Sep 3 2020, 5:35 PM

Ya, this is a scenario where links are embedded in links -- which just breaks up the HTML as per the HTML spec and which we normally handle -- but, in this case, templates are involved as well which is making it difficult for Parsoid to recover well. We'll see what we can do to handle this scenario.

We do have the https://www.mediawiki.org/wiki/Help:Lint_errors/wikilink-in-extlink linter category as well which should hopefully cover as well, but given that this is an ISBN link, not a real wikilink, it is possible we are missing these instances. If so, we'll fix that as well.

Looking at the transcript below, looks like the linting logic doesn't capture the links-in-links scenario. And, I just realized that the ISBN template is not a magic word anymore, it just generates a wikilink and so should have been identified by the linter logic. So there is probably a bug there.

[subbu@earth:~/work/wmf/parsoid] echo "[https://books.google.fr/books?id=bwtnEz5ezOAC&pg=PA82 Google Livres [[Google]] books]" | php bin/parse.php --domain fr.wikipedia.org --linting > /dev/null
{"type":"wikilink-in-extlink","dsr":[0,86,55,1],"params":[]}

[subbu@earth:~/work/wmf/parsoid] echo "[https://books.google.fr/books?id=bwtnEz5ezOAC&pg=PA82 Google Livres : ''Correspondance de Fénelon'' Tome IX, commentaire par Jean Orcibal, pp. 82, Librairie Droz, Genève, 1987 {{ISBN|978-2-600-03629-0}}]" | php bin/parse.php --domain fr.wikipedia.org --linting > /dev/null

As for the primary problem where templates & link-in-link scenarios are involved, looks like about-id continuity is broken up because of the HTML5 tree builder fixups

In the output below, parts of the #mwt1 template content is in an ext-link <a> tag and another part of it is outside the <a> link ... the problem of course is that the template markers should have been hoisted up to the <a> extlink and the entire ext-link and spilled over content should have been marked with template markup and #mwt1 about tags.

<p data-parsoid='{"dsr":[0,206,0,0]}'><a rel="mw:ExtLink" href="https://books.google.fr/books?id=bwtnEz5ezOAC&amp;pg=PA82" class="external text" data-parsoid='{"dsr":[0,206,55,1]}'>Google Livres<span typeof="mw:DisplaySpace mw:Placeholder" data-parsoid='{"src":" ","dsr":[68,69,0,0]}'> </span>: <i data-parsoid='{"dsr":[71,101,2,2]}'>Correspondance de Fénelon</i> Tome IX, commentaire par Jean Orcibal, pp. 82, Librairie Droz, Genève, 1987 <span about="#mwt1" typeof="mw:Transclusion" data-parsoid='{"pi":[[{"k":"1"}]],"dsr":[179,205,null,null]}' data-mw='{"parts":[{"template":{"target":{"wt":"ISBN","href":"./Modèle:ISBN"},"params":{"1":{"wt":"978-2-600-03629-0"}},"i":0}}]}'> </span><small style="line-height:1em;" about="#mwt1" data-parsoid='{"stx":"html"}'>(</small></a><small style="line-height:1em;" about="#mwt1" data-parsoid='{"stx":"html"}'><a rel="mw:WikiLink" href="./International_Standard_Book_Number" title="International Standard Book Number" data-parsoid='{"stx":"piped","a":{"href":"./International_Standard_Book_Number"},"sa":{"href":"International Standard Book Number"},"dsr":[0,0,null,null],"misnested":true}'>ISBN</a><span typeof="mw:Entity" data-parsoid='{"src":"&amp;nbsp;","srcContent":" ","dsr":[0,0,null,null],"misnested":true}'> </span><a rel="mw:WikiLink" href="./Spécial:Ouvrages_de_référence/978-2-600-03629-0" title="Spécial:Ouvrages de référence/978-2-600-03629-0" data-parsoid='{"stx":"piped","a":{"href":"./Spécial:Ouvrages_de_référence/978-2-600-03629-0"},"sa":{"href":"Spécial:Ouvrages de référence/978-2-600-03629-0"},"dsr":[0,0,null,null],"misnested":true}'><span class="nowrap" data-parsoid='{"stx":"html","dsr":[0,0,null,null],"misnested":true}'>978-2-600-03629-0</span></a>)</small></p>

@NicoV, but, independent of what Parsoid should do, {{ISBN}} template uses should never be embedded inside [..] extlink markup since that is bad markup. So, if you can update your WPCleaner to detect these scenarios (or you can wait for us to fix our Linter logic)) and have frwiki pages be cleaned up, that would be very helpful.

NicoV added a comment.EditedSep 8 2020, 6:59 PM

@ssastry
Thanks for the answers. WPCleaner can already detect the {{ISBN}} template in external link, and can fix some instances (where the {{ISBN}} is near the end of the external link I think). More complex cases are difficult to fix automatically...
Example of automatic fixes: Raymond Nart, Samuel Jacob Rubinstein

I knew about Linter missing a lot of links in links, see T242068. If you're interested to have examples of such problems, you can compare on frwiki between what Linter finds, and what my bot finds.

I've started fixing links in links in frwiki some time ago, but this is a big task...

Izno renamed this task from VE mutliplies ISBN template to VE multiplies ISBN template.Sep 10 2020, 10:40 PM
ssastry renamed this task from VE multiplies ISBN template to Links-in-link scenarios break template about-grouping in some scenaros..Sep 11 2020, 5:35 PM
ssastry triaged this task as Medium priority.
ssastry moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.