Page MenuHomePhabricator

Parsoid: Selser drops fostered categories?
Open, MediumPublic

Description

User-facing example: go to https://www.mediawiki.org/wiki/User:Roan_Kattouw_%28WMF%29?veaction=edit , make a small change, click "Save page" then "Review changes". Note that the diff drops the category. Now click "Return to save form" and "Resume editing", undo your changes manually (NOT using the undo function), and review changes again. This time the categories aren't dropped.

Note that in the Parsoid HTML, the category is fostered to before the table. Selser then seems to drop it on the floor, but only if there was a change to the body.

I was unable to reproduce this exact bug on the command line but I did see weird things:

$ echo '<table><tr><td>Hello</td></tr>[[Category:Foo]]' | node tests/parse.js --wt2wt
[warning][enwiki/Main Page] DSR inconsistency: cs/s mismatch for node: BODY s: 0 ; cs: 30
[[Category:Foo]]<table><tr><td>Hello</td></tr>


$ echo '<table><tr><td>Hello</td></tr>[[Category:Foo]]' | node tests/parse.js --wt2wt --selser
<nowiki>[[Category:Foo]]</nowiki>
{|
|Hello
|}

Version: unspecified
Severity: normal

Details

Reference
bz71074

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:58 AM
bzimport set Reference to bz71074.

This behavior is partly "by design". The current behavior is optimized for scenarios where the table (from which content got fostered) is not edited. We want to prevent corruption (loss of information OR duplication of content) in those scenarios.

The wt2html parse assigns zero-width dsr info to fostered content. This works well in selser because when it encounters these nodes, it just emits an empty string and the table after it emits the original content (as long as it was not edited).

But, if the table is edited, the fostered content node before it goes to '', and the table is serialized normally. Since the table html no longer has the fostered content there, the table doesn't emit the fosterable content either, and effectively, that is lost.

The solution to this problem would be to fix our DOMDiff to set edit-flags of fostered content nodes to be the same as edit-flags of the table from which they were fostered.

Still exists and reproducible as below.

[subbu@earth:~/work/wmf/parsoid] echo '<table><tr><td>Hello</td></tr>[[Category:Foo]]</table>' > /tmp/wt
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php < /tmp/wt > /tmp/old.html
[subbu@earth:~/work/wmf/parsoid] sed 's/Hello/Hello there/g;' < /tmp/old.html > /tmp/new.html
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --html2wt < /tmp/new.html 
[[Category:Foo]]
<table><tr><td>Hello there</td></tr></table>

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html 
<table><tr><td>Hello there</td></tr></table>