Steps to reproduce:
- Install mediawiki/extensions/DoubleWiki
- Create a page with the following content:
hello <div id="align-fr" style="display:none;"><pre> hello = hello" onclick="alert('XSS') </pre></div> [[fr:test]]
- After saving, add ?match=fr to the page URL
- Click "hello" on the page
This will pop up a JavaScript dialog.
Explanation:
DoubleWiki allows displaying pages on different projects links side-by-side. This works on any page with interwiki links on a wiki where the extension is enabled. Example: https://en.wikisource.org/wiki/Bible_(King_James)/Genesis?match=fr
Optionally their content can be aligned using the <div …><pre> markup described above. Example: https://en.wikisource.org/wiki/Eskimo_Life/Authors_Preface?match=no (see the markup in edit mode: https://en.wikisource.org/w/index.php?title=Eskimo_Life/Authors_Preface&action=edit)
This is implemented using some very indiscriminate regexp replacements, and allows inserting pieces of HTML into places they wasn't supposed to go. This bug is caused by the code in DoubleWiki::addMatchingTags() method.
The code implementing alignment has a lot of regexp and string operations on HTML. I have only reviewed a very small part of it (until I found a bug). There are some ~200 lines of code (methods addMatchingTags(), matchColumns(), findParagraphs(), findSlices()) that all look very risky. The side-by-side rendering itself (getMangledTextAndTranslation()) does some regexp replacements as well, but they look a lot less scary.
(It was brought to my attention after I read T323376#8406975 and tried to figure out what that code does.)