Change Details

This is a task to explore and summarize the trade-offs inherent in different possible paths towards hygienic transclusions. ## Requirements - Transclusions result in a well-defined DOM forest. - Transclusions do not affect surrounding content in unpredictable ways. - Support for - WYSIWYG editing - optimizations like incremental parsing, reuse, composition - Minimal breakage of existing content; remaining issues ideally fixable with template changes, not transclusion changes. - Simple mental model, predictable behavior. - Ideally, works consistently for old & new revisions. ## Ideas ### Thesis 1: Basically all multi-template content is started by a specific template. - How to verify assumption: - Collect stats on templated content with `data-mw.parts.length > 1`. - Look for multi-template content that only had *end tags* (but not start tags) supplied by transclusions. - Some possible counter-examples I could think of - Partially-templated list -> balanced, constraint: list item - Plain table with templated rows -> balanced, constraint: table row - Alternative: Can find limited look-ahead that lets us establish start token by backtracking, for example a table token triggered by a table cell template. ### Thesis 2: Start template implies a stable DOM parsing scope. How to verify: Define some scopes & compare with actual scope? ## Implementation sketch: Start template defined DOM scope Goal: For a given start template, figure out what should be included in the <domparse> action. - Identify multi-transclusion start templates like "table start" and mark them up in templatedata, based on statistics. - For each start template, define transclusion block boundary. Examples: - anything up to </table>" for `table start` template - anything up to <li> or </li> for `list item` template - When encountering transclusions of start templates: - Parse tokens from start transclusion to end-of-scope as separate DOM scope. - Enforce use site content model constraints on transclusion content. - All other transclusions: Parse as self-contained units, enforce use site content model constraints. ## See also - https://www.mediawiki.org/wiki/Parsoid/DOM_notes#How_to_fix_it_in_the_longer_term: Early discussion of content model constraint enforcement, proposal to coerce transclusion content. - {T114445}: A proposal for new markup for opt-in balancing. - {T57524}: Older RFC, same idea.

This is a task to explore and summarize the trade-offs inherent in different possible paths towards hygienic transclusions. ## Problem statement / motivation Transclusions can currently affect arbitrary parts of the page by producing unbalanced HTML. This causes problems: - **Editing**: Visual editing of templates is unreliable and not truly WYSIWYG. The ergonomics of editing typical multi-template content are relatively poor in the wikitext editor, and even worse in VisualEditor. - **Performance**: Unpredictable side effects prevent many important optimizations in the parser, which means that users need to wait longer than necessary, and more CPU cycles, energy and money are spent. - **Composability**: In order to better adapt the user experience to different devices, network conditions and use cases, we are interested in composing content dynamically. Doing this efficiently requires the definition of clear `components`, and limits on how a component can affect its surrounding content. Current transclusions don't satisfy these requirements. ## Requirements - Transclusions result in a well-defined DOM forest. - Transclusions do not affect surrounding content in unpredictable ways. - Support for - WYSIWYG editing - optimizations like incremental parsing, reuse, composition - Minimal breakage of existing content; remaining issues ideally fixable with template changes, not transclusion changes. - Simple mental model, predictable behavior. - Ideally, works consistently for old & new revisions. ## Ideas ### Thesis 1: Basically all multi-template content is started by a specific template. - How to verify assumption: - Collect stats on templated content with `data-mw.parts.length > 1`. - Look for multi-template content that only had *end tags* (but not start tags) supplied by transclusions. - Some possible counter-examples I could think of - Partially-templated list -> balanced, constraint: list item - Plain table with templated rows -> balanced, constraint: table row - Alternative: Can find limited look-ahead that lets us establish start token by backtracking, for example a table token triggered by a table cell template. ### Thesis 2: Start template implies a stable DOM parsing scope. How to verify: Define some scopes & compare with actual scope? ## Implementation sketch: Start template defined DOM scope Goal: For a given start template, figure out what should be included in the <domparse> action. - Identify multi-transclusion start templates like "table start" and mark them up in templatedata, based on statistics. - For each start template, define transclusion block boundary. Examples: - anything up to </table>" for `table start` template - anything up to <li> or </li> for `list item` template - When encountering transclusions of start templates: - Parse tokens from start transclusion to end-of-scope as separate DOM scope. - Enforce use site content model constraints on transclusion content. - All other transclusions: Parse as self-contained units, enforce use site content model constraints. ## See also - https://www.mediawiki.org/wiki/Parsoid/DOM_notes#How_to_fix_it_in_the_longer_term: Early discussion of content model constraint enforcement, proposal to coerce transclusion content. - {T105845} - {T114445}: A proposal for new markup for opt-in balancing. - {T57524}: Older RFC, same idea.