This task is for two separate wikitext corruption issues that occurred when adding comments to pages using DiscussionTools. They were both triggered by the deployment of the replying API (T252558), but had different causes and solutions. We should have made two tasks, but it wasn't initially clear that they are unrelated, and now all of the comments are on this task, so let's keep it as is.
Issue 1
(occurred on: 6-7 August; affected edits: ~50-100, see T260393#6384098)
When the replying API was serializing the modified document to HTML, it encoded various characters as HTML entities rather than plain text, resulting in Parsoid's selser not recognizing unmodified parts of the page, causing dirty diffs.
Most of the changes did not damage the page, and only caused distracting diffs, e.g. namespaces in internal links being changed to the canonical ones, external links and internal link anchors being percent-encoded, spaces at the ends of lines being removed.
- https://hu.wikipedia.org/w/index.php?title=Wikipédia-vita:Válaszeszköz&diff=22924986
- https://sv.wikipedia.org/w/index.php?title=Wikipedia:Bybrunnen&diff=48094324&oldid=48093752
However, some of them exposed bugs in Parsoid that generate incorrect wikitext, in particular unnecessarily generating a |link= parameter (or a localised version) for images (T108504). Also, percent-encoded links are quite annoying in non-Latin-alphabet languages.
- https://ko.wikipedia.org/w/index.php?title=사용자토론:Gomdoli4696&curid=2667493&diff=27270968&oldid=27270965&diffmode=source
- https://ar.wikipedia.org/w/index.php?title=نقاش_المستخدمة:شيماء&curid=7331385&diff=49547774&oldid=49546611&diffmode=source
Fixed in: https://gerrit.wikimedia.org/r/619018
Issue 2
(occurred on: 12-13 August; affected edits: 2)
When Parsoid was transforming a HTML document to wikitext, if there was very high replication lag ("replag"), and if a specific page revision was requested rather than "latest", Parsoid's selser would fetch the wikitext for unmodified parts of the page from the wrong revision, causing page corruption.
At the time some database replica servers were delayed by several hours due to maintenance.
DiscussionTools replying API loads the latest revision of the page using a database query, then asks Parsoid for that revision. In contrast, VisualEditor asks Parsoid for "latest" revision and then checks what revision it received, so it was never affected by this issue. (It's like this because the former was simpler to implement in server-side code and the latter was simpler in client-side code.)
Only two edits were affected:
- https://hu.wikipedia.org/w/index.php?oldid=22943756&diff=prev&diffmode=source
- https://nl.wikipedia.org/w/index.php?oldid=56909511&diff=prev&diffmode=source
Fixed in: https://gerrit.wikimedia.org/r/621621
Testing instructions
At the Arabic, Catalan, Chinese, Czech, Dutch, Korean, Serbian and Swedish beta clusters [i], do the following:
- Write and publish a comment in said wiki's native language using the Reply Tool's visual mode
- Write and publish a comment in said wiki's native language using the Reply Tool's source mode
- Ensure that "Step 1" and "Step 2" above did not cause dirty diffs (read: no additional changes are made to the page beyond the text that was written with the Reply Tool being added)
- ⚠️Please pay special attention to the issues described in the "Observed cases of disruption" section above.
i. https://meta.wikimedia.beta.wmflabs.org/wiki/Special:SiteMatrix