Bad wikitext lines starting with "| " (not in a table context) getting the pipes removed and replaced by <nowiki> </nowiki> on RT
Closed, ResolvedPublic

Description

http://en.wikipedia.org/w/index.php?title=National_Security_Intelligence&diff=next&oldid=558880809 - the corruption was later reproduced in a sandbox by Rybec "by copy-pasting the previous revision to a sandbox and editing just the lead paragraph in the same way in VE. The unwanted changes are displayed in the review window", so aside from nowikis, we need to find out why it deletes some bits of the text.

Also adding http://fr.wikipedia.org/w/index.php?title=Romain_Alessandrini&diff=95638901&oldid=95638848 which as the previous one happened after a template was left open. I think this should happen to prevent | signs to be shown as they are usually markup and not really wanted in an article, but I still can see them in View mode, so I need to understand more about this behavior. Thanks.


Version: unspecified
Severity: normal

bzimport added a project: Parsoid.Via ConduitNov 22 2014, 1:58 AM
bzimport set Reference to bz52618.
Elitre created this task.Via LegacyAug 7 2013, 9:41 PM
Jdforrester-WMF added a comment.Via ConduitOct 29 2013, 12:40 AM

This looks to be a Parsoid "bug", though frankly the behaviour of the system when given such broken input is somewhat undefined.

Minimum test case:

Foo

Bar
GWicke added a comment.Via ConduitDec 3 2013, 11:26 PM

The issue here is that we are tokenizing this to a td token, which is then dropped by the treebuilder when that does not end up inside a table. We should be able to detect this on the DOM based on shadow info. When detected, we can re-insert the original pipe so that it is not lost.

That might not yet avoid the <nowiki> insertion, but would at least preserve the content. It is also possible that selser avoids the nowiki.

gerritbot added a comment.Via ConduitFeb 21 2014, 11:35 PM

Change 114897 had a related patch set uploaded by GWicke:
WIP Bug 52618: Rescue stripped tds outside of table context

https://gerrit.wikimedia.org/r/114897

gerritbot added a comment.Via ConduitFeb 25 2014, 6:31 PM

Change 114897 merged by jenkins-bot:
Bug 52618: Rescue stripped tds outside of table context

https://gerrit.wikimedia.org/r/114897

GWicke added a comment.Via ConduitFeb 25 2014, 6:55 PM

TODO from the commit summary:

  • Avoid <nowiki>fication on round-trip (even with selser)
  • Avoid paragraph splitting by moving the paragraph wrapper to the DOM (major project)
gerritbot added a comment.Via ConduitFeb 25 2014, 8:01 PM

Change 115436 had a related patch set uploaded by GWicke:
Bug 52618: Avoid <nowiki>fication of td/tr/th syntax outside of tables

https://gerrit.wikimedia.org/r/115436

gerritbot added a comment.Via ConduitFeb 25 2014, 10:20 PM

Change 115436 merged by jenkins-bot:
Bug 52618: Avoid <nowiki>fication of td/tr/th syntax outside of tables

https://gerrit.wikimedia.org/r/115436

gerritbot added a comment.Via ConduitJun 16 2014, 8:36 PM

Change 140015 had a related patch set uploaded by Subramanya Sastry:
(Bug 52618) Suppress <nowiki>s for table WT strings outside tables

https://gerrit.wikimedia.org/r/140015

gerritbot added a comment.Via ConduitJun 17 2014, 11:54 PM

Change 140015 merged by jenkins-bot:
(Bug 52618) Suppress <nowiki>s for table WT strings outside tables

https://gerrit.wikimedia.org/r/140015

Add Comment