Page MenuHomePhabricator

Dirty diff and other weird corruption in table edit
Closed, ResolvedPublic

Description

https://www.mediawiki.org/w/index.php?title=Manual:Hooks&diff=2763398&oldid=2737756 added spaces in every table row, oddly broke up a <code> block, and did something strange to the {{#translate:}} parser function.

https://www.mediawiki.org/w/index.php?title=Manual:Hooks&diff=next&oldid=2763398 further mangled the categories and did something else weird in the code block, although I suppose that may have been @MarkAHershberger trying to fix the earlier screw-ups.

Event Timeline

Anomie created this task.Apr 24 2018, 1:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 24 2018, 1:42 PM

This looks like a case where Parsoid serialized the page without access to the HTML of the previous revision and did a normalizing html -> wt transformation. To be verified.

(1) As for <code> changes, that is correct and expected. This is because the <code> block is split up across the 2 lines because of paragraph wrapping works, thanks to doBlockLevels (see T134469 and dupes).
(2) As for mangled categories, that is because the category links are not real category links (if they are, that is a separate bug). So, Parsoid generates text for them in the HTML output. However, in the HTML -> wt direction, Parsoid's code uses a bunch of heuristics to determine if a specific piece of text in the HTML will be mistaken for wikitext and if so, slaps nowiki wrappers around them. So, in this case, our heuristics fall short. We could improve that, but in the general case, given wikitext, there is not a perfect solution. So, we'll ignore this as an edge case here.

So, the only thing left to resolve is whether VE did something to the page OR as I speculated there, there was a RB failure with providing Parsoid HTML for the previous revision. If the latter, it is likely a transient issue that will be non-repeatable.

ssastry added a comment.EditedApr 24 2018, 1:51 PM

Ah, I have to look at the part about the {{#translate}} parser function.

Ah, I have to look at the part about the {{#translate}} parser function.

Right, this is the part in (2) above and the bug is in the html -> wt part for [[Category:Hooks{{#translation:}}]] - Parsoid is serializing the template part as a template not as a parser function which is what mangled it and made it an invalid category link. Filed T192913: html2wt: category links (and probably other such) with a parser-function generated piece doesn't roundtrip properly for it.

Deskana added a subscriber: Deskana.

Moving to external for now since @ssastry thinks this is primarily a Parsoid problem; if he determines that it's not, then I'll move it back to triage.

So, the only thing left to resolve is whether VE did something to the page OR as I speculated there, there was a RB failure with providing Parsoid HTML for the previous revision. If the latter, it is likely a transient issue that will be non-repeatable.

As it turns out, neither. https://www.mediawiki.org/api/rest_v1/page/data-parsoid/Manual%3AHooks/2763872/b90f620e-47f3-11e8-af8e-70dae7575daa shows that the data-parsoid blob only has 16 entries, which is clearly broken for a page as large as that. So, without that information, Parsoid defaults to regular non-selective serialization which normalizes content and touches unmodified content.

Turns out I can reproduce this locally as well: http://localhost:8000/www.mediawiki.org/v3/page/pagebundle/Manual%3AHooks/2763872

But, parse.js --prefix mediawikiwiki --pboutfile /tmp/dp.json --pageName 'Manual:Hooks' < /dev/null > /dev/null does the right thing. So, some edge case bug in the Parsoid API endpoint likely.

ssastry assigned this task to Arlolra.Apr 24 2018, 7:35 PM
ssastry triaged this task as Normal priority.

But, parse.js --prefix mediawikiwiki --pboutfile /tmp/dp.json --pageName 'Manual:Hooks' < /dev/null > /dev/null does the right thing. So, some edge case bug in the Parsoid API endpoint likely.

Try that with --wrapSections though

I assume what's happening is that a bug / necessary condition of section wrapping is extending the template ranges to include parts of the page with legitimate dsr and that it is being discarded when cleanup happens. And then sections are dropped in the client and Parsoid is forced to serialize w/o the dsr info, which I guess means we should run cleanup before section wrapping?

Change 428831 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Sections that expand tpl ranges lose roundtrip data when dropped

https://gerrit.wikimedia.org/r/428831

Change 428831 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Sections that expand tpl ranges lose roundtrip data when dropped

https://gerrit.wikimedia.org/r/428831

Change 429988 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Unconditionally strip section tags

https://gerrit.wikimedia.org/r/429988

Change 429988 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Unconditionally strip section tags

https://gerrit.wikimedia.org/r/429988

Mentioned in SAL (#wikimedia-operations) [2018-05-07T20:48:00Z] <arlolra> Updated Parsoid to 6e38948 (T192909)

Arlolra closed this task as Resolved.May 7 2018, 8:51 PM
Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptMay 7 2018, 8:51 PM
Vvjjkkii renamed this task from Dirty diff and other weird corruption in table edit to qceaaaaaaa.Jul 1 2018, 1:14 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Arlolra as the assignee of this task.
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: Arlolra; removed: gerritbot, Aklapper.
Ryasmeen renamed this task from qceaaaaaaa to Dirty diff and other weird corruption in table edit.Jul 1 2018, 4:41 AM
Ryasmeen closed this task as Resolved.
Ryasmeen assigned this task to Arlolra.
Ryasmeen lowered the priority of this task from High to Normal.
Ryasmeen updated the task description. (Show Details)
Ryasmeen added subscribers: GerritBot, Aklapper.