Page MenuHomePhabricator

Parsoid DOM spec: Improve Cite representation to support section editing & simpler clients
Open, MediumPublic0 Estimated Story Points

Description

Citations depend on global page state (specifically the <references /> tag and named citations). This interferes with visual editing of individual sections that don't have the global state for the entire page.

Specifically:

  • a named citation in one section might refer to a citation defined elsewhere on the page.
  • if a section being edited has the <references /> tag and the group attribute is edited on the tag, the visual rendering of the page cannot be updated without having access to the references information on the page (editing from no-group => group is feasible by inspecting the DOM, but going from group => no-group still requires page-global state).
  • Parsoid's DOM representation for <ref>s have cross-references to the section containing <references /> tags since the data-mw attribute for a <ref> points to the DOM tree id of the ref-body in the rendering of the references section. While this was done to reduce the size of the DOM that VE has to fetch, this also is a barrier for section editing.

Possible simplification: Provide full citation metadata for each citation, at point of definition

  • Provide full data-mw for each citation, at the point where the citation is defined. If the reference is to a citation group, still provide full metadata, plus an attribute indicating the group name.
  • Editing:
    • For plain citations (not grouped), the inline data-mw is updated.
    • For grouped citations, the editor can ask the user whether the citation should be disassociated from the group, or whether the group should be updated.
      • Disassociation: Remove group name reference from data-mw.
      • Update group: Keep the group reference. The other references to the same group are updated server-side.
    • Citation removal: Parsoid will handle the re-assignment of primary group definitions in case the primary definition was removed in HTML.

Pros

  • Supports efficient section retrieval / editing.
  • Moves some complex update logic to Parsoid, simplifies clients.

Cons

  • Accurate rerendering of the full references list still requires data-mw for full page.

Event Timeline

TL:DR; summary is that this will revert changes from T88290: Update data-mw encoding for <ref> tags to point to the HTML content in the <references /> output rather than duplicating it and which can mean worst case impacts like that in T88290#1054424. gzipped transfer impact will be lower, but something to be conscious of.

I whipped up a quick proof of concept patch today @ https://gerrit.wikimedia.org/r/#/c/mediawiki/services/parsoid/+/441151/. Sometime this week, I'll update the task description to reflect the strategy I am using there after chatting with @Esanders about this. But, looks like we don't have to undo the performance fix. Plus, this is a generic solution that works for any (*with caveats to deal with transclusion ranges) DOM subtree, not just a section. So, this is future-proofed for fine-grained editing.

Pinging @Catrope, who is working on a similar problem with CX.