This is a task to explore and summarize the trade-offs inherent in different possible paths towards hygienic transclusions.
## Problem statement / motivation
Transclusions can currently affect arbitrary parts of the page by producing unbalanced HTML. This causes problems:
- **Editing**: Visual editing of templates is unreliable and not truly WYSIWYG. The ergonomics of editing typical multi-template content are relatively poor in the wikitext editor, and even worse in VisualEditor.
- **Performance**: Unpredictable side effects prevent many important optimizations in the parser, which means that users need to wait longer than necessary, and more CPU cycles, energy and money are spent.
- **Composability**: In order to better adapt the user experience to different devices, network conditions and use cases, we are interested in composing content dynamically. Doing this efficiently requires the definition of clear `components`, and limits on how a component can affect its surrounding content. Current transclusions don't satisfy these requirements.
## Requirements
- Transclusions result in a well-defined DOM forest.
- Transclusions do not affect surrounding content in unpredictable ways.
- Support for
- WYSIWYG editing
- optimizations like incremental parsing, reuse, composition
- Minimal breakage of existing content; remaining issues ideally fixable with
template changes, not transclusion changes.
- Simple mental model, predictable behavior.
- Ideally, works consistently for old & new revisions.
## Ideas
The three main approaches currently discussed are:
a) **Opt-in**: No balancing by default, special syntax for explicitly requesting balancing of specific transclusions.
b) **Opt-out**: All transclusions are balanced by default, special syntax can be used to explicitly widen the balancing scope to support multi-transclusion content with unbalanced parts.
c) **Inference**: All transclusions are balanced by default, but statistics and templatedata are used to identify templates that are normally unbalanced, as well as the normal end tags. DOM scopes are established by typical end templates or tags.
### Possible approaches for inference
#### Thesis 1: Basically all multi-template content is started by a specific template.
- How to verify assumption:
- Collect stats on templated content with `data-mw.parts.length > 1`.
- Look for multi-template content that only had *end tags* (but not start
tags) supplied by transclusions.
- Some possible counter-examples I could think of
- Partially-templated list -> balanced, constraint: list item
- Plain table with templated rows -> balanced, constraint: table row
- Alternative: Can find limited look-ahead that lets us establish start
token by backtracking, for example a table token triggered by a table cell
template.
#### Thesis 2: Start template implies a stable DOM parsing scope.
How to verify: Define some scopes & compare with actual scope?
#### Implementation sketch: Start template defined DOM scope
Goal: For a given start template, figure out what should be included in the
<domparse> action.
- Identify multi-transclusion start templates like "table start" and mark them
up in templatedata, based on statistics.
- For each start template, define transclusion block boundary. Examples:
- anything up to </table>" for `table start` template
- anything up to <li> or </li> for `list item` template
- When encountering transclusions of start templates:
- Parse tokens from start transclusion to end-of-scope as separate DOM scope.
- Enforce use site content model constraints on transclusion content.
- All other transclusions: Parse as self-contained units, enforce use site
content model constraints.
## See also
- https://www.mediawiki.org/wiki/Parsoid/DOM_notes#How_to_fix_it_in_the_longer_term: Early discussion of content model constraint enforcement, proposal to coerce transclusion content.
- {T105845}
- {T114445}: A proposal for new markup for opt-in balancing.
- {T57524}: Older RFC, same idea.