https://en.wikipedia.org/w/index.php?title=Christian_Georg_Kohlrausch&diff=next&oldid=682549552
''[[Wikipedia:Link rot|<span title=" since September 2015">dead link</span>]]'']
https://en.wikipedia.org/w/index.php?title=Christian_Georg_Kohlrausch&diff=next&oldid=682549552
''[[Wikipedia:Link rot|<span title=" since September 2015">dead link</span>]]'']
Same thing here, most likely.
https://sv.wikipedia.org/w/index.php?title=Sverige&diff=prev&oldid=30649922
Reported on the Swedish Village Pump as a problem https://sv.wikipedia.org/wiki/Wikipedia:Bybrunnen#Problem_med_VisualEditor
Since this breaks appearance of the articles it edits (https://sv.wikipedia.org/w/index.php?title=Sverige&oldid=30649922#cite_note-68), this should be prioritised accordingly.
We may be talking about two different things here.
For the first one, see the question in the title: have you been able to verify (independently or with the users who made the edits) if the issue is due to copy/paste, or if there's another way to reproduce it?
For the second one, we're talking about a link which looks like
[http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf Atheism: Contemporary Rates and Patterns {{pdf}}] .
Why is the PDF template inside the link? Everything works just fine when it's placed outside.
Actually, it's the same problem: putting a template inside a link.
What the user did was selecting the entire line, including the dead link template, and applying a different link to it.
I'll change the title of this task accordingly, but I'm inclined to think this is not a bug, but more users trying to kill MediaWiki...
(edit comment conflict)
I was assuming those were the same problem, since both were substituted or similarly. If they are different errors, feel free to split this. However in the first url it was also inside the brackets.
As long as it doens't kill MediaWiki and it works as a workaround, the expected result should not be altered by VE.
Can we get a dump of exactly what the HTML looks like before and after the edit? That will help us determine if this is a Parsoid or a VE bug.
In theory having a template in the link text should not be a problem for Parsoid as far as I know.
URL to diff:
https://sv.wikipedia.org/w/index.php?title=Sverige&diff=next&oldid=30692384
wikicode before VE edit:
eller icke-troende (på Gud).<ref>Zuckerman, Phil (2007), [http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf Atheism: Contemporary Rates and Patterns {{pdf}}] i ''Cambridge Companion to Atheism''. Cambridge: Cambridge University Press. ISBN 0-521-60367-6</ref> År 2013 var ändå
HTML code before VE edit:
eller icke-troende (på Gud).<sup id="cite_ref-68" class="reference"><a href="#cite_note-68"><span class="cite-reference-link-bracket">[</span>68<span class="cite-reference-link-bracket">]</span></a></sup> År 2013 var ändå
<li id="cite_note-68"><a href="#cite_ref-68">^</a> <span class="reference-text">Zuckerman, Phil (2007), <a rel="nofollow" class="external text" href="http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf">Atheism: Contemporary Rates and Patterns</a> <a href="/wiki/Fil:Noia_64_mimetypes_pdf.png" class="image"><img alt="Noia 64 mimetypes pdf.png" src="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/14px-Noia_64_mimetypes_pdf.png" width="14" height="14" srcset="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x" data-file-width="64" data-file-height="64" /></a> <small><a href="/wiki/Portable_Document_Format" title="Portable Document Format">PDF</a></small> i <i>Cambridge Companion to Atheism</i>. Cambridge: Cambridge University Press. <a href="/wiki/Special:Bokk%C3%A4llor/0521603676" class="internal mw-magiclink-isbn">ISBN 0-521-60367-6</a></span></li>
wikicode after VE edit:
eller icke-troende (på Gud).<ref>Zuckerman, Phil (2007), [http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf Atheism: Contemporary Rates and Patterns {{pdf}}][./Fil:Noia_64_mimetypes_pdf.png [[Fil:Noia_64_mimetypes_pdf.png|link=|14x14px]]] <small>[[Portable Document Format|PDF]]</small> i ''Cambridge Companion to Atheism''. Cambridge: Cambridge University Press. ISBN 0-521-60367-6</ref> År 2013 var ändå
HTML code after VE edit:
eller icke-troende (på Gud).<sup id="cite_ref-68" class="reference"><a href="#cite_note-68"><span class="cite-reference-link-bracket">[</span>68<span class="cite-reference-link-bracket">]</span></a></sup> År 2013 var ändå
<li id="cite_note-68"><a href="#cite_ref-68">^</a> <span class="reference-text">Zuckerman, Phil (2007), <a rel="nofollow" class="external text" href="http://www.pitzer.edu/academics/faculty/zuckerman/Ath-Chap-under-7000.pdf">Atheism: Contemporary Rates and Patterns</a> <a href="/wiki/Fil:Noia_64_mimetypes_pdf.png" class="image"><img alt="Noia 64 mimetypes pdf.png" src="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/14px-Noia_64_mimetypes_pdf.png" width="14" height="14" srcset="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x, upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x" data-file-width="64" data-file-height="64" /></a> <small><a href="/wiki/Portable_Document_Format" title="Portable Document Format">PDF</a></small>[./Fil:Noia_64_mimetypes_pdf.png <img alt="Noia 64 mimetypes pdf.png" src="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/14px-Noia_64_mimetypes_pdf.png" width="14" height="14" srcset="upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x, upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x" data-file-width="64" data-file-height="64" />] <small><a href="/wiki/Portable_Document_Format" title="Portable Document Format">PDF</a></small> i <i>Cambridge Companion to Atheism</i>. Cambridge: Cambridge University Press. <a href="/wiki/Special:Bokk%C3%A4llor/0521603676" class="internal mw-magiclink-isbn">ISBN 0-521-60367-6</a></span></li>
I think I have found a similar situation here:
There was an ISBN template inside a URL link (which is garbage, I know), and VE moved the link end bracket and inserted unnecessary span tags. You can see it happen repeatedly over a series of edits here:
It makes unnecessary cleanup work for gnomes. VE should ignore templates inside of URL links, please, not make them worse.
Could the original just be another case of "you can't copy wikitext templates when you're reading an HTML page", and therefore a case of T54091?
@Jonesey95, I think that error may be specific to the {{ISBN}} template. Compare the same edit with {{fact}} vs the ISBN template: https://en.wikipedia.org/w/index.php?title=User:Whatamidoing_(WMF)/sandbox&diff=935153206&oldid=935153183&diffmode=source vs https://en.wikipedia.org/w/index.php?title=User:Whatamidoing_(WMF)/sandbox&diff=935153094&oldid=935153060&diffmode=source (Placing the {{fact}} tag looks obviously broken beforehand, too.) Maybe split that to a different task?
Looks like is a case of wikilink-in-extlink / image-in-extlink wikitext. In the output, there are multiple link fragments instead of just one.
[subbu@earth:~/work/wmf/parsoid] echo '[http://foo.bar/some.pdf Some pdf file {{pdf}}]' | php bin/parse.php --domain sv.wikipedia.org --body_only --normalize <p><a href="http://foo.bar/some.pdf">Some pdf file <figure-inline></figure-inline></a><a href="Fil:Noia_64_mimetypes_pdf.png"><img src="//upload.wikimedia.org/wikipedia/commons/7/7d/Noia_64_mimetypes_pdf.png" data-file-width="64" data-file-height="64" data-file-type="bitmap" height="14" width="14" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/28px-Noia_64_mimetypes_pdf.png 2x, //upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Noia_64_mimetypes_pdf.png/21px-Noia_64_mimetypes_pdf.png 1.5x"/></a> <small><a href="Portable_Document_Format" title="Portable Document Format">PDF</a></small></p> [subbu@earth:~/work/wmf/parsoid] echo '[http://foo.bar/some.pdf Some pdf file {{pdf}}]' | php bin/parse.php --domain sv.wikipedia.org --wt2wt [http://foo.bar/some.pdf Some pdf file {{pdf}}][[Fil:Noia 64 mimetypes pdf.png|14x14px]] <small>[[Portable Document Format|PDF]]</small>
We consider links-in-links bad wikitext since in HTML, you cannot embed links inside links. All attempts to do that will cause the output to render multiple links. While Parsoid has a linting rule for flagging wikilink-in-extlink scenarios, we don't lint for image-in-extlink scenarios. I'll file a phab task for that.
I am glad to see that some of these very old VE bugs are being worked on, but....
Yes, we know that there are existing Linter errors, and volunteers are working on fixing them by the million. In the meantime, VE should stop making articles worse just by saving them, as demonstrated in the diffs above. That's all we are asking. VE has multiple (very old) bugs like this that just make more work for volunteer gnomes. All we are asking is that VE ignore this text, not try to be clever and fail.
Sorry, I didn't communicate the entirety of why this is not an issue in my previous post.
What I pasted above was Parsoid's behavior when we don't use selective serialization -- just to demonstrate that the issue is indeed the use of links inside links. However, on the production wikis, Parsoid uses selective serialization, i.e. it tries to detect edited portions of the document and only converts only those pieces to new wikitext. So, on all these pages that have these link-in-link or image-in-link wikitext errors, Parsoid will NOT dirty the page simply by saving edits on that page. Otherwise, by now, we would have introduced changes on all the pages that have this error any time someone makes an edit anywhere on the page.
The only time Parsoid introduces a dirty diff is if you actually edit the content that came from the link-in-link / image-in-link -- as in the examples in this bug report. It is possible that back in 2015, our selective serializer was more crude and might have introduce dirty diffs in a few additional scenarios, but in 2020, it is lot more refined and a lot more bugs have been shaken out.
Here is a mock edit session simulated from my commandline that demonstrates this:
[subbu@earth:~/work/wmf/parsoid] echo -e 'A\n\n[http://foo.bar/some.link [[Foo]] image]' > /tmp/wt [subbu@earth:~/work/wmf/parsoid] php bin/parse.php < /tmp/wt > /tmp/old.html [subbu@earth:~/work/wmf/parsoid] sed 's/A<\/p>/EDITED TEXT<\/p>/g;' < /tmp/old.html > /tmp/new.html [subbu@earth:~/work/wmf/parsoid] php bin/parse.php --selser --html2wt --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html > /tmp/edited.wt [subbu@earth:~/work/wmf/parsoid] diff /tmp/wt /tmp/edited.wt 1c1 < A --- > EDITED TEXT
So, over there, I edited the opening paragraph that contained the text a and changed it to EDITED TEXT. When i uses the selective serialization algorithm on the edited HTML, you can see that the only change it made was to that paragraph. It did not corrupt the broken image-in-link wikitext .
Hope this that clarifies the matter better.
Thank you for the more detailed answer, the key part of which for me is that this may have been happening way back in 2015 (or possibly still in March 2019, in the diff I provided above, where a new section was added and the unchanged section above it appears to have been modified poorly by Parsoid), but it is probably not happening now. That seems like a valid reason to close this bug. We gnomes will continue to fix the links-in-links errors.