Page MenuHomePhabricator

End table added at an obviously incorrect place
Closed, DeclinedPublic

Description

When a table end |} is missing, and VE has no clue where the table should really end, it shouldn't add it at random, especially at a place that is obviously a mistake...
https://fr.wikipedia.org/w/index.php?title=%C3%89lection_r%C3%A9gionale_de_2015_en_%C3%8Ele-de-France&type=revision&diff=119750680&oldid=119750664

Event Timeline

NicoV raised the priority of this task from to Needs Triage.
NicoV updated the task description. (Show Details)
NicoV added projects: Parsoid, VisualEditor.
NicoV added subscribers: NicoV, Aklapper.

It is actually not random. It is at the end of the document which makes sense if you think about it.

@ssastry For me, the end of the document is clearly one place that is 100% sure to be a mistake : table is part of the article text, categories should always go after the text ; by putting the table end after the categories, you simply go against one of the most basic convention...

But even if it's not at random : adding it without knowing where it makes sense is bad...

@NicoV, I understand, but here are some thoughts that touch on a few different aspects of this.

wikitext (as a language) doesn't provide clear guidance about how unclosed tags are to be handled -- the behavior is somewhat arbitrary and left to whatever html-parsing tools are used in the wikitext-parsers (Tidy, HTML5 tree building algorithm). But yes, we could code parsing behavior based on the specific tags, and examining other context, but not all conventions are universal across wikis which makes this kind of "what did the editor really want to do here" kind of second-guessing messy. For sure, we don't want to code up a bunch of hacks to work around these issues.

Note that even if we closed the table before the categories, this is still an unsatisfactory result since we've still swallowed up entire sections within the table. That is definitely not what the editor intended. And, we (=PHP parser, Parsoid) cannot really arbitrarily close the table at some other place since it is possible (and there is wikitext out there) to embed sections in tables.

So, the real solution here is to:

  • think about how we introduce some notion of scoping in wikitext ( T114444 )
  • take to completion tools that we've already started that detects and flags markup "errors" ( T48705 )

In general, given our limited resources, it is not the best use of our time fixing edge case issues instead of focusing on long-term projects like the above.

Well, I would say that the safe and cheap approach is : If you don't know where to add a table end, just don't add it... and certainly not add it in some place that has a lot more chances of being incorrect than correct

Unfortunately, it is a bit trickier than that. The table is closed when parsing wikitext to HTML. And, there is no way to avoid it. Parsoid almost always knows that the table was auto-closed because of a missing tag. So, in theory, we can avoid closing it. In practice, it will need more logic for the following reasons.

Selective serialization already handles the scenerio when the table or any of its descendent children are not edited -- it leaves it untouched. However, whenever anything in the table is edited, parts of the table get serialized with the default serialization algorithm. In the general case, we do not use information about auto-closed tags since it is not always accurate to use that information in the presence of edits (T57410 exists for example).

So, one way to fix this would be to be detect this scenario as one where it is safe to use parse-time autoInsertedEnd flags and skip the "|}". However, this is not so obvious. Once contents of the table has been edited, we cannot clearly determine that the markup error has also not been fixed up some fashion, i.e. some editing client might notice the misnesting of sections in a table and move it out, which effectively means that the table has been edited to closed it properly and it would be an error for Parsoid to skip the "|}".

Once again, it doesn't make sense to add extra complexity for what does seem to be some edge case behavior. We won't really solve the issue any way and just add other edge cases besides making Parsoid itself more complex for no good reason.

ssastry claimed this task.

I am going to decline this, but please feel free to reopen this with an explanation if you think I am missing something here.