VisualEditor adds strange <article> and <footer> tags to page content
Closed, ResolvedPublic

Description

Unsure what the IP user exactly did, but the content of the initial revision of this page looks strange:

https://pl.wikisource.org/w/index.php?title=Strona:Karol_May_-_Przez_pustyni%C4%99_tom_1.djvu/2&oldid=1579899

It seems some extra non-standard tags are added to page content as well as footer content was moved to the main page content.

I suggest disabling VE for Page namespace in non-test wikis if such behaviour cannot be easily fixed.

Ankry created this task.Sep 13 2017, 10:09 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 13 2017, 10:09 AM
Ankry updated the task description. (Show Details)Sep 13 2017, 10:11 AM
Deskana changed the task status from Open to Stalled.Sep 13 2017, 11:22 AM
Deskana added a subscriber: Deskana.

We'll need more information on how to reproduce this in order to diagnose the problem.

I should note that I ran a search and there are no other occurrences of "<article>" anywhere in the Polish Wikisource. There are only three occurrences anywhere in the English Wikisource. Based on all available data, this appears to be a very rare problem. Disabling VisualEditor as a result of a very rare problem is out of the question.

Deskana triaged this task as Normal priority.Sep 13 2017, 11:22 AM
Deskana moved this task from To Triage to TR1: Releases on the VisualEditor board.
Ankry added a comment.Sep 13 2017, 8:14 PM

We'll need more information on how to reproduce this in order to diagnose the problem.

No idea what the above-mentioned IP did.

I should note that I ran a search and there are no other occurrences of "<article>" anywhere in the Polish Wikisource.

@Deskana This tag appaered in only one already fixed edit (not a top one) mentioned above. No more edits is expected to appear as such broken ones are blocked by AbuseFilter now.
Their evidence might appear in AbuseLog only. I do not think pl.ws to be a test site for random users.

Tpt moved this task from Backlog to VisualEditor on the ProofreadPage board.Oct 30 2017, 4:58 PM
Magol added a subscriber: Magol.May 3 2018, 9:10 AM
matmarex added a subscriber: matmarex.EditedJun 10 2018, 10:06 AM

I think I know why this could happen, it's specific to ProofreadPage integration:

  • When receiving the HTML from Parsoid, VE will wrap the whole page in an <article> tag with exactly one each of <header>, <section>, <footer> as children. <header> will contain all content up to first </noinclude> in wikitext, and <footer> will contain all content following the last <noinclude> in wikitext.
  • When sending the edited HTML back to Parsoid, VE will remove exactly one <article> tag and its children.

The problem is that nothing is stopping the user from inserting additional <article> tags, and those will not be removed (we remove exactly one). There is no option to do this in the interface, and we try a bit to prevent them from doing it accidentally (e.g. you can't select across these elements to copy-paste them), but there might be some bug making it possible (I couldn't reproduce it though, maybe whatever caused it here was fixed). Then they will be treated by Parsoid like any other tag and end up in wikitext (<article>, <header>, <section>, <footer>, are all valid in wikitext sorry, that's incorrect, they are not valid – but they still end up there, so that's doubly broken).

We should be able to remove all these tags instead of just one. This will still be rather confusing for the user when it happens (rarely as it does), but it should not generate messy wikitext.

In the longer term I think we'll need to reconsider the header/section/footer split, it is not really compatible with the wikitext semantics here (there are valid reasons for a tag or a transclusion to span the header/section or section/footer boundary, e.g. with formatting in books that spans the page boundary).

Change 439520 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/ProofreadPage@master] ve.init.mw.ProofreadPagePageTarget: Improve section handling

https://gerrit.wikimedia.org/r/439520

Change 439520 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] ve.init.mw.ProofreadPagePageTarget: Improve section handling

https://gerrit.wikimedia.org/r/439520

matmarex closed this task as Resolved.Jun 19 2018, 5:43 PM
matmarex removed a project: Patch-For-Review.
matmarex claimed this task.
Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJun 19 2018, 5:43 PM