Page MenuHomePhabricator

Parsoid removes all newlines from the page when editing a table inside a list item `:{|`
Open, Needs TriagePublic

Description

Parsoid removes all newlines from the page when editing a table inside a list item :{|.

Example: (in this edit, I typed "a" after the "test" in the a section)
https://en.wikipedia.org/w/index.php?title=User:Matma_Rex/sandbox&diff=930049246&oldid=930049140&diffmode=source

My guess is that the list item puts Parsoid into a single-line context, which tries to render the rest of the page as if it was a single list item, with no newlines.

Event Timeline

matmarex created this task.Dec 9 2019, 10:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 9 2019, 10:11 PM
matmarex updated the task description. (Show Details)Dec 9 2019, 10:12 PM
matmarex updated the task description. (Show Details)Dec 9 2019, 10:16 PM

All the content in the table in your example is found in a fosterable position.

If your table looked like,

:{|
|

Parsoid probably wouldn't have a problem with it, since it does have protection to avoid single-line context constraints in tables,
https://github.com/wikimedia/parsoid/blob/master/src/Html2Wt/DOMHandlers/TableHandler.php#L29-L31

It's the fostered content that ends up being serialized in the list item that results in the single line.

In general, editing fostered content is error prone.

ssastry closed this task as Invalid.Dec 10 2019, 7:21 AM

Being able to corrupt the entire document just because of some unbalanced wikitext elsewhere on the page seems less than ideal...

Maybe we should consider marking fostered content such that it is uneditable?

Maybe we should consider marking fostered content such that it is uneditable?

Maybe? I think that something needs to be done about this. If I introduce a wikitext error in one part of the page, then IMO you should still be able to edit other parts of the page without being blocked by my error, and you should be able to fix my wikitext error.

DLynch added a subscriber: DLynch.Dec 12 2019, 6:21 PM

I'm not convinced this should be invalid -- it's a pretty bad page-corruption issue if bad wikitext goes through Parsoid.

Fostered markup and dealing with the mess it creates has been one of the biggest sources of complexity in the parsoid codebase. We handle a lot of scenarios already but in general, it is hard to catch all issues.

And, in this scenario, the table markup is bogus ... but I suspect this is just one more hack editors are using to get around the limitation of not being able to embed multi-line content in lists. And, they are relying on the fact that the entire content will get fostered, and the empty table will be invisible. So, we'll have to add special hacks to handle this special editor hack.

But one option is to block editing of such tables / list items. But, if we want to limit it to just this scenario vs. blocking editing of all tables that have fostered content, we'll need special detection at which point we might as well handle it. Alternatively, we can just mass disable editing of tables with fostered content. LintErrors already identifies all pages that have this problem. https://en.wikipedia.org/wiki/Special:LintErrors says about 38K instances on enwiki .. which is about 0.1% of pages?

Thoughts? And, perhaps that discussion should be a different ticket.

I'm not convinced this should be invalid -- it's a pretty bad page-corruption issue if bad wikitext goes through Parsoid.

Parsoid does not guarantee it will preserve every broken pattern in use. But, if it is a common enough pattern, we add hacks / workarounds. So, I wouldn't say "bad wikitext" in general.

In any case, my previous comment lays out some options for this particular scenario. How commonly is this hack used on talk pages?

I think what makes this case unique is that the failure is so catastrophic. The entirety of the page (or just after the table?) gets corrupted during the edit.

I think what makes this case unique is that the failure is so catastrophic. The entirety of the page (or just after the table?) gets corrupted during the edit.

Is it just that list item pr everything else on the page beyond the list item? Reading the title and description, I got the impression it is just the table in that list item and hence is limited.

In any case, what are your thoughts on T240280#5737072 and also any info about how common this is?

We'll separately chat about this to see if there is a simple enough fix to detect this scenario.

Is it just that list item pr everything else on the page beyond the list item? Reading the title and description, I got the impression it is just the table in that list item and hence is limited.

The linked edit by Bartosz shows the entire rest of the page getting corrupted.

In this example a lot of the page is corrupted because the previous edit added an unclosed table.

Esanders reopened this task as Open.Dec 12 2019, 7:58 PM

(re-opening as the discussion is active)

Is it just that list item pr everything else on the page beyond the list item? Reading the title and description, I got the impression it is just the table in that list item and hence is limited.

The linked edit by Bartosz shows the entire rest of the page getting corrupted.

In this example a lot of the page is corrupted because the previous edit added an unclosed table.

In both these cases, if you look at the rendering, you will notice that adding that unclosed table wikitext effectively breaks the rendering of the entire page from that point on, i.e. the rest of the page is pulled into the table as fostered content .. and the now empty table is deleted. So, from there, a few questions.

  1. What is the use case that benefits from that brokenness? So, assuming you don't really want to support that use case, one possible solution would be for us to suppress (visual) editability of those tables and associated fostered content which serves as an explicit indicator to editors to fix the broken markup before proceeding. But, I guess the question I asked earlier is if we should make it a more general suppression of editability of table with fosterable content for the same reason - get editors to fix the broken markup. In Parsoid land, we want to remove all these hacks, not add more of them, if we can help it.
  2. But, assuming you want to support this use case from some reason, and you don't want Parsoid to suppress editability either, we can investigate more.

So, yes, I agree Parsoid shouldn't do nasty dirtying of pages with broken markup, but it is a question of what the best strategy here is. Do we want to pile on more code hacks or do we want to disincentivize reliance on such mark up? If this is an uncommon use case, we are willing to say: please fix the markup - we won't support this use case.

cscott added a subscriber: cscott.Tue, Jan 7, 4:33 PM
cscott added a comment.Tue, Jan 7, 4:54 PM

From david: “The ability to enter :{| into any page and have it strip all newlines the next time Parsoid interacts with said page” isn’t great!

This is relevant to talk pages, since : is auto-inserted by the post-a-reply tool to get proper indentation. So the user "just" made a little typo when trying to create a table in the reply, and suddenly the whole page gets corrupted.

From david: “The ability to enter :{| into any page and have it strip all newlines the next time Parsoid interacts with said page” isn’t great!
This is relevant to talk pages, since : is auto-inserted by the post-a-reply tool to get proper indentation. So the user "just" made a little typo when trying to create a table in the reply, and suddenly the whole page gets corrupted.

I am not trying to be difficult, but

  • If someone enters a "{|" after a ":" (accidentally or not), it immediately breaks the rendering of the page (Parsoid not involved here). So, it will require a revert / fix of this broken syntax. That is why I was hesitant to prioritze this highly for Parsoid. But, yes, I get the notion that ppl can do source wikitext comment even in its broken form and have the revert happen later .. but that affordance isn't there with Parsoid.
  • "{|" after a ":" is much less likely to be a typo and much more likely to be a deliberate insertion since it requires a specific unusual combination to occur at the start of a line (vs. elsewhere on the line).

Overall, I continue to feel this is not an important blocker given the above observations. But given how strongly a number of you seem to feel about it, we'll take a look at it and see what is reasonable on Parsoid's end.