VisualEditor/ProofreadPage splitting of the page content into header/section/footer is incompatible with existing content (and more generally, with the layout of physical books). It causes pages to be unopenable in VE or to become corrupted after saving.
However, this doesn't mean we have to entirely forego the "visual" split of the page into sections! We will need to rethink and rewrite how it is implemented, but it should be possible to display similar "section headings" in most cases, using the same approach for <noinclude> and </noinclude> markers as we use for displaying more normal "invisible" tags (e.g. <indicator />).
(I'm filing the task after talking with @Ankry at 2018 WMPL conference)
For reference, VisualEditor/ProofreadPage integration currently inserts <article> and <header>/<section>/<footer> tags into the Parsoid HTML before parsing it, like this: (and removes them after generating the edited HTML)
https://pl.wikisource.org/w/index.php?title=Strona:O_ontologicznej_beznadziejności_logiki,_fizykalizmu_i_pseudo-naukowego_monizmu_wogóle.djvu/7&oldid=1766239
https://pl.wikisource.org/api/rest_v1/page/html/Strona%3AO_ontologicznej_beznadziejności_logiki%2C_fizykalizmu_i_pseudo-naukowego_monizmu_wogóle.djvu%2F7/1766239
This works for simple pages (in this example, the header only contains a <pagequality> tag, and the footer only contains empty references and __NOEDITSECTION__ magic word), but fails in more complicated cases.
Unfortunately for us, paged books can contain "block-level elements" that are split across multiple pages – for example, a table or a quotation indent. When the book is shown unpaged, these must be displayed as a single element (e.g. one table with common borders, rather than two separate tables).
In Wikisources' wikitext syntax, this would be represented like this (simplified example):
When those two pages are transcluded one after another, they generate a single continuous table. When they are viewed separately, each page displays a table instead of a broken jumble of wikitext.
Unfortunately our header/section/footer scheme is incompatible with this – you can't wrap another HTML tag around just the opening <table> tag. (You can wrap a <noinclude>…</noinclude> around it, because it is not a HTML tag, but instead parsed at the same step as parsing template transclusions {{…}}). Current code tries really hard to do it anyway, causing different issues:
- Table spanning pages: https://en.wikisource.org/wiki/Page:Indian_mathematics,_Kaye_(1915).djvu/48 – VE does not open, error message:
- Other HTML tags spanning pages: https://en.wikisource.org/wiki/Page:COTUS_(1787_Edition).djvu/1 – VE loads, there's corruption on save:
- Template transclusion spanning pages: https://pl.wikisource.org/wiki/Strona:Lucyna_Ćwierczakiewiczowa_-_365_obiadów_za_5_złotych.djvu/326 – VE loads, but there's an error, then corruption on save:
In some cases you might argue that the spanning tags are unnecessary (the COTUS and Lucyna Ćwierczakiewiczowa pages probably could be done without them, if we had to), but I do not see an alternative solution for the Indian mathematics page.
I think that in this case of real world colliding with our model, we have to change the model and not the real world.