This is a task to explore and summarize the trade-offs inherent in different possible paths towards hygienic transclusions.
## Problem statement / motivation
Transclusions can currently affect arbitrary parts of the page by producing unbalanced HTML. This causes problems:
- **Editing**: Visual editing of templates is unreliable and not truly WYSIWYG. The ergonomics of editing typical multi-template content are relatively poor in the wikitext editor, and even worse in VisualEditor.
- **Performance**: Unpredictable side effects prevent many important optimizations in the parser, which means that users need to wait longer than necessary, and more CPU cycles, energy and money are spent.
- **Composability**: In order to better adapt the user experience to different devices, network conditions and use cases, we are interested in composing content dynamically. Doing this efficiently requires the definition of clear `components`, and limits on how a component can affect its surrounding content. Current transclusions don't satisfy these requirements.
## Requirements
- Transclusions result in a well-defined DOM forest.
- Transclusions do not affect surrounding content in unpredictable ways.
- Support for
- WYSIWYG editing
- optimizations like incremental parsing, reuse, composition
- Minimal breakage of existing content; remaining issues ideally fixable with
template changes, not transclusion changes.
- Simple mental model, predictable behavior.
- Ideally, works consistently for old & new revisions.
## Ideas
### Thesis 1: Basically all multi-template content is started by a specific template.
- How to verify assumption:
- Collect stats on templated content with `data-mw.parts.length > 1`.
- Look for multi-template content that only had *end tags* (but not start
tags) supplied by transclusions.
- Some possible counter-examples I could think of
- Partially-templated list -> balanced, constraint: list item
- Plain table with templated rows -> balanced, constraint: table row
- Alternative: Can find limited look-ahead that lets us establish start
token by backtracking, for example a table token triggered by a table cell
template.
### Thesis 2: Start template implies a stable DOM parsing scope.
How to verify: Define some scopes & compare with actual scope?
## Implementation sketch: Start template defined DOM scope
Goal: For a given start template, figure out what should be included in the
<domparse> action.
- Identify multi-transclusion start templates like "table start" and mark them
up in templatedata, based on statistics.
- For each start template, define transclusion block boundary. Examples:
- anything up to </table>" for `table start` template
- anything up to <li> or </li> for `list item` template
- When encountering transclusions of start templates:
- Parse tokens from start transclusion to end-of-scope as separate DOM scope.
- Enforce use site content model constraints on transclusion content.
- All other transclusions: Parse as self-contained units, enforce use site
content model constraints.
## See also
- https://www.mediawiki.org/wiki/Parsoid/DOM_notes#How_to_fix_it_in_the_longer_term: Early discussion of content model constraint enforcement, proposal to coerce transclusion content.
- {T105845}
- {T114445}: A proposal for new markup for opt-in balancing.
- {T57524}: Older RFC, same idea.