Page MenuHomePhabricator

Parsoid doesn't detect fostered content from a table and causes corruption on DiscussionTools edits
Open, MediumPublic

Description

See Diff 1 and Diff2.

Both diffs are on https://fr.wikipedia.org/wiki/Projet:Psychologie/Caf%C3%A9_des_psys.

That page has a {{Projet:Psychologie/Café des psys/Introduction}} right at the top embedded in <noinclude>..</noinclude>.

If you look at the source of https://fr.wikipedia.org/wiki/Projet:Psychologie/Caf%C3%A9_des_psys/Introduction, right at the bottom it has this wikitext:

{| style="margin-top:40px; border-top:3px solid #AAAAAA; width:100%;" cellpadding="0" cellspacing="0"
|-
| valign=center height=60px |
...
|-
<div>

The <div> and effectively the content that follows the template on the original page is now in a fosterable position and does get fostered out of the table (the output of core parser as well as Parsoid are consistent here). However, Parsoid's fostered content detection algorithm fails to detect this. Given this, on HTML -> WT conversion, there is selser corruption (duplication of some unmarked fostered content).

Since DiscussionTools relies on Parsoid's fostered lint on these pages to prevent reply-tool from being enabled on these pages, the missing lint means DiscussionTools cannot stop reply tool from being used.

Note that using VE on this page can also cause similar content duplication errors.

Event Timeline

ssastry renamed this task from Fostered content from a table isn't detected causing corruption in DiscussionTools edits to Parsoid doesn't detect fostered content from a table and causes corruption on DiscussionTools edits.Dec 7 2020, 1:23 AM
ssastry triaged this task as Medium priority.

However, Parsoid's fostered content detection algorithm fails to detect this

We don't emit foster boxes when in transclusion since the template wrapping will take care roundtripping the content,
https://github.com/wikimedia/parsoid/blob/master/src/Wt2Html/HTML5TreeBuilder.php#L283-L292

T290936 is similar only in that there seems to be a false assumption that Parsoid will report any fostering on a page.

since the template wrapping will take care roundtripping the content,

But why isn't it in this case ...

Can the content be placed outside of the template wrapper after fostering?

The problem here is that,

  1. foster detection is disabled because the table originates from a template and, in that case, we assume the template wrapping will roundtrip the content, which is usually the case
  2. the template originating table here is unclosed so that it ends up fostering its own template end marker! in which case, the table isn't transclusion marked and gets duplicated, as visible in the diffs

This seems familiar and was probably discussed the balanced template conversations.

From that patch (which was for T141905: Parsoid crashes from logs):

* The conclusion of the long comment in the patch is that we need to
   either,

   1) Fix up encapsulateTemplates() to handle flipped ranges in the
      really edgy case they occur,

   2) Or, emit foster boxes even in transclusion, since they may be
      fostering their own closing meta.  Try, {{1x|<table><div>}}

I had the idea today that we should be able to prevent this problem on the DiscussionTools's side, by detecting malformed about-groups (any nodes with the same 'about' attribute that aren't consecutive siblings). I'm not sure if that would handle all cases, but it would definitely handle the zh.wp issue from the latest duplicate.

On second thought that doesn't work when there's just one fostered node, which is probably more common (e.g. if a template is wrapped in a <div>), so that's probably not worth the effort.