Page MenuHomePhabricator

VisualEditor/ProofreadPage splitting of the page content into header/section/footer is incompatible with existing content (and more generally, with the layout of physical books)
Open, LowPublic

Assigned To
None
Authored By
matmarex
Jun 11 2018, 12:35 AM
Referenced Files
F21371555: image.png
Jun 11 2018, 12:35 AM
F21369465: image.png
Jun 11 2018, 12:35 AM
F21348137: image.png
Jun 11 2018, 12:35 AM
F21348339: modified.html
Jun 11 2018, 12:35 AM
F21348340: original.html
Jun 11 2018, 12:35 AM
F21390147: image.png
Jun 11 2018, 12:35 AM
F21348473: image.png
Jun 11 2018, 12:35 AM

Description

VisualEditor/ProofreadPage splitting of the page content into header/section/footer is incompatible with existing content (and more generally, with the layout of physical books). It causes pages to be unopenable in VE or to become corrupted after saving.

image.png (1×1 px, 791 KB)

However, this doesn't mean we have to entirely forego the "visual" split of the page into sections! We will need to rethink and rewrite how it is implemented, but it should be possible to display similar "section headings" in most cases, using the same approach for <noinclude> and </noinclude> markers as we use for displaying more normal "invisible" tags (e.g. <indicator />).

(I'm filing the task after talking with @Ankry at 2018 WMPL conference)


For reference, VisualEditor/ProofreadPage integration currently inserts <article> and <header>/<section>/<footer> tags into the Parsoid HTML before parsing it, like this: (and removes them after generating the edited HTML)

https://pl.wikisource.org/w/index.php?title=Strona:O_ontologicznej_beznadziejności_logiki,_fizykalizmu_i_pseudo-naukowego_monizmu_wogóle.djvu/7&oldid=1766239
https://pl.wikisource.org/api/rest_v1/page/html/Strona%3AO_ontologicznej_beznadziejności_logiki%2C_fizykalizmu_i_pseudo-naukowego_monizmu_wogóle.djvu%2F7/1766239

image.png (416×1 px, 70 KB)
image.png (416×1 px, 72 KB)

This works for simple pages (in this example, the header only contains a <pagequality> tag, and the footer only contains empty references and __NOEDITSECTION__ magic word), but fails in more complicated cases.


Unfortunately for us, paged books can contain "block-level elements" that are split across multiple pages – for example, a table or a quotation indent. When the book is shown unpaged, these must be displayed as a single element (e.g. one table with common borders, rather than two separate tables).

In Wikisources' wikitext syntax, this would be represented like this (simplified example):

1Lorem ipsum.
2
3{|
4! Header A
5! Header B
6|-
7| Cell A1
8| Cell B1
9<noinclude>
10|}
11</noinclude>
1<noinclude>
2{|
3</noinclude>
4|-
5| Cell A1
6| Cell B1
7|}
8
9Dolor sit amet.

When those two pages are transcluded one after another, they generate a single continuous table. When they are viewed separately, each page displays a table instead of a broken jumble of wikitext.

Unfortunately our header/section/footer scheme is incompatible with this – you can't wrap another HTML tag around just the opening <table> tag. (You can wrap a <noinclude></noinclude> around it, because it is not a HTML tag, but instead parsed at the same step as parsing template transclusions {{}}). Current code tries really hard to do it anyway, causing different issues:

  • Table spanning pages: https://en.wikisource.org/wiki/Page:Indian_mathematics,_Kaye_(1915).djvu/48 – VE does not open, error message:
    1Uncaught TypeError: Cannot read property 'getModel' of null
    2 at VeCeTableRowNode.ve.ce.TableRowNode.setupMissingCell (load.php?…:794)
    3 at VeCeTableRowNode.ve.ce.TableRowNode.onSetup (load.php?…:793)
    4 at VeCeTableRowNode.oo.EventEmitter.emit (<anonymous>:153:486)
    5 at VeCeTableRowNode.ve.ce.View.setLive (load.php?…:615)
    6 at VeCeTableRowNode.ve.ce.BranchNode.setLive (load.php?…:625)
    7 at VeCeMwPagesSectionNode.ve.ce.BranchNode.setLive (load.php?…:625)
    8 at VeCeArticleNode.ve.ce.BranchNode.setLive (load.php?…:625)
    9 at VeCeDocumentNode.ve.ce.BranchNode.setLive (load.php?…:625)
    10 at VeCeSurface.ve.ce.Surface.initialize (load.php?…:667)
    11 at VeUiMWSurface.ve.ui.Surface.initialize (load.php?…:814)
  • Other HTML tags spanning pages: https://en.wikisource.org/wiki/Page:COTUS_(1787_Edition).djvu/1 – VE loads, there's corruption on save:
    image.png (980×1 px, 148 KB)
  • Template transclusion spanning pages: https://pl.wikisource.org/wiki/Strona:Lucyna_Ćwierczakiewiczowa_-_365_obiadów_za_5_złotych.djvu/326 – VE loads, but there's an error, then corruption on save:
    1Uncaught TypeError: Cannot read property '$element' of null
    2 at load.php?…:24803
    3 ...
    4 (anonymous) @ load.php?…:24803
    5 requestAnimationFrame (async)
    6 ve.ce.FocusableNode.updateInvisibleIcon @ load.php?…:24800
    7 ve.ce.FocusableNode.onFocusableSetup @ load.php?…:24786
    8 oo.EventEmitter.emit @ VM682:246
    9 ve.ce.View.setLive @ load.php?…:23872
    10 ve.ce.BranchNode.setLive @ load.php?…:24234
    11 ve.ce.BranchNode.setLive @ load.php?…:24234
    12 ve.ce.BranchNode.setLive @ load.php?…:24234
    13 ve.ce.Surface.initialize @ load.php?…:25713
    14 ve.ui.Surface.initialize @ load.php?…:30408
    15 (anonymous) @ VM718:135
    16 ...
    image.png (980×1 px, 387 KB)

In some cases you might argue that the spanning tags are unnecessary (the COTUS and Lucyna Ćwierczakiewiczowa pages probably could be done without them, if we had to), but I do not see an alternative solution for the Indian mathematics page.

I think that in this case of real world colliding with our model, we have to change the model and not the real world.

Event Timeline

Change 439520 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/ProofreadPage@master] ve.init.mw.ProofreadPagePageTarget: Improve section handling

https://gerrit.wikimedia.org/r/439520

The patch above only disables the current behavior in some cases where we can detect that it will cause corruption. We will get uglier interface for those cases now (with no way for the user to tell which part of the page is a header/footer), but that's better than breaking the wikitext contents.

Deskana triaged this task as Low priority.
Deskana edited projects, added VisualEditor (Current work); removed VisualEditor.
Deskana moved this task from Incoming to Code review on the VisualEditor (Current work) board.

Change 439520 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] ve.init.mw.ProofreadPagePageTarget: Improve section handling

https://gerrit.wikimedia.org/r/439520

Vvjjkkii renamed this task from VisualEditor/ProofreadPage splitting of the page content into header/section/footer is incompatible with existing content (and more generally, with the layout of physical books) to wabaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii removed matmarex as the assignee of this task.
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Ankry renamed this task from wabaaaaaaa to VisualEditor/ProofreadPage splitting of the page content into header/section/footer is incompatible with existing content (and more generally, with the layout of physical books).Jul 1 2018, 7:11 AM
Ankry assigned this task to matmarex.
Ankry lowered the priority of this task from High to Low.
Ankry updated the task description. (Show Details)
Ankry added subscribers: gerritbot, Aklapper.