Page MenuHomePhabricator

Template inside a link renders weirdly after the page gets VEdited
Closed, DeclinedPublic8 Estimated Story Points

Description

https://en.wikipedia.org/w/index.php?title=Christian_Georg_Kohlrausch&diff=next&oldid=682549552

''[[Wikipedia:Link rot|<span title=" since September 2015">dead link</span>]]''&#x5D;

Event Timeline

Josve05a raised the priority of this task from to Needs Triage.
Josve05a updated the task description. (Show Details)
Josve05a added a project: VisualEditor.
Josve05a subscribed.
Jdforrester-WMF renamed this task from VE substituted a template to <div>-tags to A template got substituted to <div>-tags (due to copy-and-paste?).Sep 29 2015, 7:08 PM
Jdforrester-WMF triaged this task as Low priority.
Jdforrester-WMF set Security to None.
Jdforrester-WMF edited a custom field.
Jdforrester-WMF moved this task from To Triage to Freezer on the VisualEditor board.
Josve05a renamed this task from A template got substituted to <div>-tags (due to copy-and-paste?) to A template got substituted (due to copy-and-paste?).Oct 3 2015, 9:53 PM
Josve05a raised the priority of this task from Low to Medium.Oct 4 2015, 12:14 PM

Since this breaks appearance of the articles it edits (https://sv.wikipedia.org/w/index.php?title=Sverige&oldid=30649922#cite_note-68), this should be prioritised accordingly.

We may be talking about two different things here.
For the first one, see the question in the title: have you been able to verify (independently or with the users who made the edits) if the issue is due to copy/paste, or if there's another way to reproduce it?

For the second one, we're talking about a link which looks like
[http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf Atheism: Contemporary Rates and Patterns {{pdf}}] .
Why is the PDF template inside the link? Everything works just fine when it's placed outside.

Actually, it's the same problem: putting a template inside a link.
What the user did was selecting the entire line, including the dead link template, and applying a different link to it.
I'll change the title of this task accordingly, but I'm inclined to think this is not a bug, but more users trying to kill MediaWiki...

(edit comment conflict)
I was assuming those were the same problem, since both were substituted or similarly. If they are different errors, feel free to split this. However in the first url it was also inside the brackets.

Elitre renamed this task from A template got substituted (due to copy-and-paste?) to Template inside a link renders weirdly after the page gets VEdited.Nov 21 2015, 2:43 PM
Elitre edited projects, added VisualEditor-Links; removed VisualEditor-CopyPaste.

users trying to kill MediaWiki...

As long as it doens't kill MediaWiki and it works as a workaround, the expected result should not be altered by VE.

cscott subscribed.

Can we get a dump of exactly what the HTML looks like before and after the edit? That will help us determine if this is a Parsoid or a VE bug.

In theory having a template in the link text should not be a problem for Parsoid as far as I know.

Can we get a dump of exactly what the HTML looks like before and after the edit? That will help us determine if this is a Parsoid or a VE bug.

@cscott, https://www.mediawiki.org/wiki/Parsoid/Debugging#Dumping_HTML_DOM_in_VE

URL to diff:

https://sv.wikipedia.org/w/index.php?title=Sverige&diff=next&oldid=30692384

wikicode before VE edit:

eller icke-troende (på Gud).<ref>Zuckerman, Phil (2007), [http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf Atheism: Contemporary Rates and Patterns {{pdf}}] i ''Cambridge Companion to Atheism''. Cambridge: Cambridge University Press. ISBN 0-521-60367-6</ref> År 2013 var ändå

HTML code before VE edit:

eller icke-troende (på Gud).<sup id="cite_ref-68" class="reference"><a href="#cite_note-68"><span class="cite-reference-link-bracket">[</span>68<span class="cite-reference-link-bracket">]</span></a></sup> År 2013 var ändå

<li id="cite_note-68"><a href="#cite_ref-68">^</a> <span class="reference-text">Zuckerman, Phil (2007), <a rel="nofollow" class="external text" href="http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf">Atheism: Contemporary Rates and Patterns</a> <a href="/wiki/Fil:Noia_64_mimetypes_pdf.png" class="image"><img alt="Noia 64 mimetypes pdf.png" src="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/14px-Noia_64_mimetypes_pdf.png" width="14" height="14" srcset="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x" data-file-width="64" data-file-height="64" /></a>&#160;<small><a href="/wiki/Portable_Document_Format" title="Portable Document Format">PDF</a></small> i <i>Cambridge Companion to Atheism</i>. Cambridge: Cambridge University Press. <a href="/wiki/Special:Bokk%C3%A4llor/0521603676" class="internal mw-magiclink-isbn">ISBN 0-521-60367-6</a></span></li>

wikicode after VE edit:

eller icke-troende (på Gud).<ref>Zuckerman, Phil (2007), [http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf Atheism: Contemporary Rates and Patterns {{pdf}}][./Fil:Noia_64_mimetypes_pdf.png [[Fil:Noia_64_mimetypes_pdf.png|link=|14x14px]]]&nbsp;<small>[[Portable Document Format|PDF]]</small> i ''Cambridge Companion to Atheism''. Cambridge: Cambridge University Press. ISBN 0-521-60367-6</ref> År 2013 var ändå

HTML code after VE edit:

eller icke-troende (på Gud).<sup id="cite_ref-68" class="reference"><a href="#cite_note-68"><span class="cite-reference-link-bracket">[</span>68<span class="cite-reference-link-bracket">]</span></a></sup> År 2013 var ändå

<li id="cite_note-68"><a href="#cite_ref-68">^</a> <span class="reference-text">Zuckerman, Phil (2007), <a rel="nofollow" class="external text" href="http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf">Atheism: Contemporary Rates and Patterns</a> <a href="/wiki/Fil:Noia_64_mimetypes_pdf.png" class="image"><img alt="Noia 64 mimetypes pdf.png" src="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/14px-Noia_64_mimetypes_pdf.png" width="14" height="14" srcset="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x, upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x" data-file-width="64" data-file-height="64" /></a>&#160;<small><a href="/wiki/Portable_Document_Format" title="Portable Document Format">PDF</a></small>[./Fil:Noia_64_mimetypes_pdf.png <img alt="Noia 64 mimetypes pdf.png" src="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/14px-Noia_64_mimetypes_pdf.png" width="14" height="14" srcset="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x, upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x" data-file-width="64" data-file-height="64" />]&#160;<small><a href="/wiki/Portable_Document_Format" title="Portable Document Format">PDF</a></small> i <i>Cambridge Companion to Atheism</i>. Cambridge: Cambridge University Press. <a href="/wiki/Special:Bokk%C3%A4llor/0521603676" class="internal mw-magiclink-isbn">ISBN 0-521-60367-6</a></span></li>

I think I have found a similar situation here:

https://en.wikipedia.org/w/index.php?title=Latin_American_music_in_the_United_States&diff=prev&oldid=889404456

There was an ISBN template inside a URL link (which is garbage, I know), and VE moved the link end bracket and inserted unnecessary span tags. You can see it happen repeatedly over a series of edits here:

https://en.wikipedia.org/w/index.php?diff=891141344&oldid=890457473&title=Role_of_music_in_World_War_II&type=revision

It makes unnecessary cleanup work for gnomes. VE should ignore templates inside of URL links, please, not make them worse.

Could the original just be another case of "you can't copy wikitext templates when you're reading an HTML page", and therefore a case of T54091?

@Jonesey95, I think that error may be specific to the {{ISBN}} template. Compare the same edit with {{fact}} vs the ISBN template: https://en.wikipedia.org/w/index.php?title=User:Whatamidoing_(WMF)/sandbox&diff=935153206&oldid=935153183&diffmode=source vs https://en.wikipedia.org/w/index.php?title=User:Whatamidoing_(WMF)/sandbox&diff=935153094&oldid=935153060&diffmode=source (Placing the {{fact}} tag looks obviously broken beforehand, too.) Maybe split that to a different task?

Looks like is a case of wikilink-in-extlink / image-in-extlink wikitext. In the output, there are multiple link fragments instead of just one.

[subbu@earth:~/work/wmf/parsoid] echo '[http://foo.bar/some.pdf Some pdf file {{pdf}}]' | php bin/parse.php --domain sv.wikipedia.org --body_only --normalize

<p><a href="http://foo.bar/some.pdf">Some pdf file <figure-inline></figure-inline></a><a href="Fil:Noia_64_mimetypes_pdf.png"><img src="//upload.wikimedia.org/wikipedia/commons/7/7d/Noia_64_mimetypes_pdf.png" data-file-width="64" data-file-height="64" data-file-type="bitmap" height="14" width="14" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x, //upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x"/></a> <small><a href="Portable_Document_Format" title="Portable Document Format">PDF</a></small></p>

[subbu@earth:~/work/wmf/parsoid] echo '[http://foo.bar/some.pdf Some pdf file {{pdf}}]' | php bin/parse.php --domain sv.wikipedia.org --wt2wt
[http://foo.bar/some.pdf Some pdf file {{pdf}}][[Fil:Noia 64 mimetypes pdf.png|14x14px]]&nbsp;<small>[[Portable Document Format|PDF]]</small>

We consider links-in-links bad wikitext since in HTML, you cannot embed links inside links. All attempts to do that will cause the output to render multiple links. While Parsoid has a linting rule for flagging wikilink-in-extlink scenarios, we don't lint for image-in-extlink scenarios. I'll file a phab task for that.

I am glad to see that some of these very old VE bugs are being worked on, but....

Yes, we know that there are existing Linter errors, and volunteers are working on fixing them by the million. In the meantime, VE should stop making articles worse just by saving them, as demonstrated in the diffs above. That's all we are asking. VE has multiple (very old) bugs like this that just make more work for volunteer gnomes. All we are asking is that VE ignore this text, not try to be clever and fail.

Sorry, I didn't communicate the entirety of why this is not an issue in my previous post.

What I pasted above was Parsoid's behavior when we don't use selective serialization -- just to demonstrate that the issue is indeed the use of links inside links. However, on the production wikis, Parsoid uses selective serialization, i.e. it tries to detect edited portions of the document and only converts only those pieces to new wikitext. So, on all these pages that have these link-in-link or image-in-link wikitext errors, Parsoid will NOT dirty the page simply by saving edits on that page. Otherwise, by now, we would have introduced changes on all the pages that have this error any time someone makes an edit anywhere on the page.

The only time Parsoid introduces a dirty diff is if you actually edit the content that came from the link-in-link / image-in-link -- as in the examples in this bug report. It is possible that back in 2015, our selective serializer was more crude and might have introduce dirty diffs in a few additional scenarios, but in 2020, it is lot more refined and a lot more bugs have been shaken out.

Here is a mock edit session simulated from my commandline that demonstrates this:

[subbu@earth:~/work/wmf/parsoid] echo -e 'A\n\n[http://foo.bar/some.link [[Foo]] image]' > /tmp/wt
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php < /tmp/wt > /tmp/old.html
[subbu@earth:~/work/wmf/parsoid] sed 's/A<\/p>/EDITED TEXT<\/p>/g;' < /tmp/old.html > /tmp/new.html 
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --selser --html2wt --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html > /tmp/edited.wt
[subbu@earth:~/work/wmf/parsoid] diff /tmp/wt /tmp/edited.wt
1c1
< A
---
> EDITED TEXT

So, over there, I edited the opening paragraph that contained the text a and changed it to EDITED TEXT. When i uses the selective serialization algorithm on the edited HTML, you can see that the only change it made was to that paragraph. It did not corrupt the broken image-in-link wikitext .

Hope this that clarifies the matter better.

Thank you for the more detailed answer, the key part of which for me is that this may have been happening way back in 2015 (or possibly still in March 2019, in the diff I provided above, where a new section was added and the unchanged section above it appears to have been modified poorly by Parsoid), but it is probably not happening now. That seems like a valid reason to close this bug. We gnomes will continue to fix the links-in-links errors.