This is a task to explore and summarize the trade-offs inherent in different possible paths towards hygienic transclusions.
Problem statement / motivation
Transclusions can currently affect arbitrary parts of the page by producing unbalanced HTML. This causes problems:
- Editing: Visual editing of templates is unreliable and not truly WYSIWYG. The ergonomics of editing typical multi-template content are relatively poor in the wikitext editor, and even worse in VisualEditor.
- Performance: Unpredictable side effects prevent many important optimizations in the parser, which means that users need to wait longer than necessary, and more CPU cycles, energy and money are spent.
- Composability: In order to better adapt the user experience to different devices, network conditions and use cases, we are interested in composing content dynamically. Doing this efficiently requires the definition of clear components, and limits on how a component can affect its surrounding content. Current transclusions don't satisfy these requirements.
Requirements
- Transclusions result in a well-defined DOM forest.
- Single-rooted trees would be even more efficient to match.
- Transclusions do not affect surrounding content.
- Corollary: Transclusion scopes are stable across transclusion re-expansions.
- Minimal breakage of existing content. Remaining issues ideally fixable with template changes, not transclusion changes.
- Simple mental model, predictable behavior.
- Ideally, works consistently for old & new revisions.
Ideas
The three main approaches currently discussed are:
a) Opt-in: No balancing by default, special syntax for explicitly requesting balancing of specific transclusions.
b) Opt-out: All transclusions are balanced by default, special syntax can be used to explicitly widen the balancing scope to support multi-transclusion content with unbalanced parts.
c) Inference: All transclusions are balanced by default, but statistics and templatedata are used to identify templates that are normally unbalanced, as well as the normal end tags. DOM scopes are established by typical end templates or tags.
Opt-in
- T114445: [RFC] Balanced templates: A proposal for new markup for opt-in balancing.
Advantages
- Relatively easy to implement.
- Can be gradually phased in.
Disadvantages
- Results in low coverage out of the box.
- Requires changes to most transclusion sites to enable balancing.
- Does not support old revisions.
Opt-out
- DOM notes, proposing a <domparse> extension tag that causes its content to be balanced as one unit. Use case: Multi-template content, like table start / row / end combinations.
Advantages
- Matches common transclusion behavior, and yields predictable behavior for new citations.
Disadvantages
- Risk of breaking existing multi-part content.
- Need to add explicit markup for muti-part transclusions.
- No support for old revisions.
Inference
Automatic inference aims to establish the scope of transclusion balancing using statistics across template uses. It is based on the observation that most templates are balanced (infoboxes, navboxes, citations etc), and a few (such as table start or end templates) are not balanced by design. Based on experience, there are very few ambiguous top-level templates. Parsoid has rich information about which templates are typically balanced / unbalanced, so it is conceivable that we could use these statistics to identify templates that are normally not balanced.
The challenge with inference is establishing a stable transclusion scope. We have not spent a lot of time investigating possible options for this problem yet. Here are some possible ideas:
Thesis 1: Basically all multi-template content is started by a specific template.
- How to verify assumption:
- Collect stats on templated content with data-mw.parts.length > 1.
- Look for multi-template content that only had *end tags* (but not start tags) supplied by transclusions.
- Some possible counter-examples I could think of
- Partially-templated list -> balanced, constraint: list item
- Plain table with templated rows -> balanced, constraint: table row
- Alternative: Can find limited look-ahead that lets us establish start token by backtracking, for example a table token triggered by a table cell template.
Thesis 2: Start template implies a stable DOM parsing scope.
How to verify: Define some scopes & compare with actual scope?
Implementation sketch: Start template defined DOM scope
Goal: For a given start template, figure out what should be included in the
<domparse> action.
- Identify multi-transclusion start templates like "table start" and mark them up in templatedata, based on statistics.
- For each start template, define transclusion block boundary. Examples:
- anything up to </table>" for table start template
- anything up to <li> or </li> for list item template
- When encountering transclusions of start templates:
- Parse tokens from start transclusion to end-of-scope as separate DOM scope.
- Enforce use site content model constraints on transclusion content.
- All other transclusions: Parse as self-contained units, enforce use site content model constraints.
Advantages
- Avoids the need to modify transclusion sites.
- Makes the machines do most of the work.
- Works consistently for both new & old revisions.
- Achieves high coverage.
Disadvantages
- Open question if stable DOM scopes can be established based on template classification.
- No visual indication of DOM scopes in wikitext.
See also
- https://www.mediawiki.org/wiki/Parsoid/DOM_notes#How_to_fix_it_in_the_longer_term: Early discussion of content model constraint enforcement, proposal to coerce transclusion content.
- T105845: RFC: Page components / content widgets
- T114445: [RFC] Balanced templates: A proposal for new markup for opt-in balancing.
- T57524: Enforce proper nesting of most templates, and encapsulate compound content blocks: Older RFC, same idea.