Some rough ideas on how we could integrate several independent ideas / requirements around section editing, element ids and incremental parsing:
- As discussed in T78676, only setting element ids on top-level sections would reduce the compressed html size by about 25%. We could use a path scheme to identify information for nested elements. Something like an index array like `[0,2,5]` for `sectionNode.childNodes.childNodes.childNodes`.
- Section editing by top-level section can be efficiently implemented with an offset index and string-based operations. The granularity of edits would still be reasonably small (apart from huge tables, perhaps). Using a top-level wikitext section offset index, we could even serialize only the modified top-level sections, and reuse the wikitext wholesale for the unmodified sections without ever loading the full DOM (which currently accounts for about half of the html2wt time).
- Mobile web & apps would like top-level sections (those defined by headings, so often multi-paragraph) to be wrapped into a <section> element for rendering purposes: T78734. They would also like to be able to retrieve the lead section separately from other sections, especially for apps. This can again be supported efficiently with an offset index.
- Incremental parsing in parsoid could be section-based too. This would also align with the expectation of wikitext section edits not affecting other parts of the page.