Page MenuHomePhabricator

[RFC] Balanced templates
Open, Stalled, MediumPublic

Description

(These were originally called "hygienic templates", which got confused with hygienic template arguments. The latter are now called "heredoc" arguments, and "hygiene" is no more.)

As described in my Wikimania 2015 talk (starting at slide 27), there are a number of reasons to mark certain templates as "balanced". Foremost among them: to allow high-performance incremental update of page contents after templates are modified, and to allow safe editing of template uses using HTML-based tools such as Visual Editor or jsapi. More discussion of motivation is at T130567 (and covered in RFC meeting E159).

"Balance" means (roughly) that the output of the template is a complete DocumentFragment: every open tag is closed. Furthermore, there are some restrictions on context to ensure there are no open tags which the template will implicitly close, nor nodes which the HTML adoption agency algorithm will reorder. (More precise details below.)

Template balance is enforced: tags are closed or removed as necessary to ensure that the output satisfies the necessary constraints, regardless of the values of the template arguments or how child templates are expanded.

Properly balanced template inclusion allows efficient update of articles by doing substring substitution for template bodies, without having to expand all templates to wikitext and reparse from scratch. It also guarantees that the template (and surrounding content) will be editable in Visual Editor; mistakes in template arguments won't "leak out" and prevent editing of surrounding content.

Wikitext Syntax
After some bikeshedding, we decided that balance should be an "opt-in" property of templates, indicated by adding a {{#balance:TYPE}} marker to the content. This syntax leverages the existing "parser function" syntax, and allows for different types of balance to be named where TYPE is.

We propose three forms of balance, of which the first and perhaps the second are likely to be implemented initially. Other balancing modes would provide safety in different HTML-parsing contexts, and may be added in the future if there is need.

  1. {{#balance:block}} (informally) would close any open <p>/<a>/<h*>/<table> tags in the article preceding the template insertion site. In the template content all tags left open at the end will be closed, but there is no other restriction. This is similar to how block-level tags work in HTML 5. This is useful for navboxes and other "block" content.
    • Formally: in context preceding template, close p, a, table, h[1-6], style, script, xmp, iframe, noembed, noframes, plaintext, noscript, textarea, select, template, dd, dt, and pre. (Alternatively, close all but div and section.) After template, close all open tags.
  2. {{#balance:inline}} would only allow inline (i.e. phrasing) content and silently delete block-level tags seen in the content. But because of this, it can be used inside a block-level context without closing active <p>/<a>/<h*> in the article (as {{#balance:block}} would). This is useful for simple plain text templates, e.g. age calculation.
    • Formally: In context preceding template, close style, script, xmp, iframe, noembed, noframes, plaintext, noscript, textarea, table, ruby, and select, template. These are the tags which change tokenizer or parser modes. (ruby affects subsequent parsing of rb/rtc/rp/rt.) Wrap the template with <span>...</span>, in order to trigger AFE reconstruction. Inside the template, strip address, article, aside, blockquote, center, details, dialog, dir, div, dl, fieldset, figcaption, figure, footer, header, hgroup, main, menu, nav, ol, p, section, summary, ul, h[1-6], pre, listing, form, li, dd, dt, plaintext, button, a, nobr, hr, isindex, xmp, optgroup, and option. These are the elements which can trigger a close tag to be emitted in body parsing mode.
    • To see the need for <span> wrapping, consider <div><b><i>foo</b>{{template}}</div> where the template is <meta>bar<b>bat</b>. The output with <span> wrapping is: <div><b><i>foo</i></b><i><span><meta>bar<b>bat</b></span></i></div> whereas without span wrapping we'd get <div><b><i>foo</i></b><meta><i>bar<b>bat</b></i></div> -- note that the <span> causes the <i> to precede the template content, instead of migrating inside it.
  3. {{#balance:table}} would allow insertion inside <table> and allow <td>/<th> tags in the content. The exact semantics need to be nailed down; it is possible that the inline mode might be extended to allow safe insertion inside <td>/<th> elements, which might remove some of the need for a special table mode. Templates which wish to insert rows or sequences of cells might still need a special mode.

We expect {{#balance:block}} to be most useful for the large-ish templates whose efficient replacement would make the most impact on performance, and so we propose {{#balance:}} as shorthand for {{#balance:block}}. (The current wikitext grammar does not allow {{#balance}}, since the trailing colon is required in parser function names, but the current patch set accommodates this without too much pain.)

Violations of content restrictions (ie, a <p> tag in a {{#balance:inline}} template) would be errors, but how these errors would be conveyed is an orthogonal issue. Currently bad tags are stripped silently. Some other options for error reporting include ugly bold text visible to readers (like {{cite}}), wikilint-like reports, or inclusion in [[Category:Balance Errors]]. Note that errors might not appear immediately: they may only occur when some other included template is edited to newly produce disallowed content, or only when certain values are passed as template arguments.

Implementation
Implementation is slightly different in the PHP parser and in Parsoid. Incremental parsing/update would necessarily not be done in the PHP parser, but it does need to enforce equivalent content model constraints for consistency.

In both implementations, we begin by recording the balance mode desired by each tranclusion and then adding a synthetic <mw:balance-TYPE> tag around the transcluded content.

PHP parser implementation strategy:

  • In the Sanitizer validate the synthetic <mw:balance-TYPE> tag to prevent forgery in wikitext, but otherwise pass the tag through.
  • Just before handing the output to tidy/depurate, perform a "cheap" parse by splitting on < characters, as the Sanitizer does, and naïvely tracking open/close tags seen on a stack (again, as the Sanitizer already does). When the <mw:balance-TYPE> open/close tag is seen, traverse the open tag stack and emit close tags as needed. Even though this pass is just an approximation of true HTML5 parsing, and doesn't accurately track AFE state or implicitly generated tags (like <tbody>), this has been validated to be sufficient. For example, even though we don't track the implicit <tbody> tag on our naïve stack, it can only be present if there was an outer <table> tag, and emitting </table> is sufficient to close the implicit <tbody>.
  • So far it has not been necessary to access "precise" HTML5 parse information in order to implement balancing. If this is necessary in the future, a pure-PHP implementation of the HTML5 Tree Builder pass has been implemented.

In Parsoid:

  • In the tree builder we have access to a fully accurate open-element stack, so we can emit precisely the correct close tags.
  • If/when PHP switches over to a DOM-based tidy, it might be able to use this same implementation strategy (balancing inside tidy) but it's not a requirement.
  • Testing **

A fuzz tester has been written, based on domino, which generates random sequences of tags and text for template and context, and then evaluates whether the desired semantics hold; that is, whether the following two expressions are equal:

  • tidy(tidy(balance(context)).replace(':hole:', tidy(stripOutsideMarker(balance(template)))))
    • Context and template balanced and tidied in isolation, then template inserted via string replacement
  • tidy(tidy(balance(context.replace(':hole:', stripOutsideMarker(template)))))
    • Template inserted into context, then balanced and tidied.

In this context tidy is just an HTML5 parse and serialize. The context is expected to contain <mw:balance-TYPE>:hole:</mw:balance-TYPE> somewhere inside it. The template is also wrapped with <mw:balance-TYPE> tags. The stripOutsideMarker function removes everything outside the <mw:balance-TYPE> tag. Note that we use tidy twice in the second case, because some tidy transformations are sensitive to the number of times we've tidied -- for example, table fostering can leave nodes in positions where they will be further altered by a subsequent tidy.

This tool has validated the set of tags named in the formal definitions of the balance modes, as well as verifying that the "sloppy parse" done in the PHP implementation yields the same results as a precise parse would.

CAVEAT: This tester does not run the output through "legacy tidy". It is possible that the p-wrapping, empty element removal, and other nonstandard evilness performed by legacy tidy might affect the correctness of the balancing. I will hook up legacy tidy to the fuzz tester to look into this; hopefully the transition from legacy tidy to depurate will also make this consideration moot.

Examples
Here are some examples of the balance transformation:

  1. <p><a href="hello"><mw:balance-block><a href="world">foo<p></mw:balance-block>bar
    • The balancer will transform this to: <p><a href="hello"></a></p><mw:balance-block><a href="world">foo<p></p></a></mw:balance-block>bar
    • An HTML5 parse (or tidy) will transform this to: <p><a href="hello"></a></p><a href="world">foo<p></p></a>bar
    • The block balancing ensured that we didn't have an <a> tag inside an <a> tag.
    • The block balancing ensured that the inner <p> didn't implicitly close an outer <p>.
  2. <p><code><center><mw:balance-inline><span></mw:balance-inline><h1>foo
    • The balancer will transform this to: <p><code><center><span><mw:balance-inline><span></span></mw:balance-inline></span><h1>foo
    • An HTML5 parse (or tidy) will transform this to: <p><code></code></p><center><code><span><span></span></span><h1>foo</h1></code></center>
    • Note that HTML5 implicitly closes the <p> when it encounters <center>. This is why <center> is stripped inside (inline balanced) template contents.
    • Note that the HTML5 "reconstruction of active formatting element list" algorithm adds a new synthetic <code> element before the <span>. The balance algorithm adds a <span> *outside* of the template content, to trigger AFE reconstruction and ensure that AFEs of the context don't leak inside the template.

Deployment
Unmarked templates are "unbalanced" and will render exactly the same as before, they will just be slower (require more CPU time) than balanced templates.

It is expected that we will profile the "costliest"/"most frequently used/changed" templates on wikimedia projects and attempt to add balance markers first to those templates where the greatest potential performance gain may be achieved. @tstarling noticed that adding a balance marker to [[[:en:Template:Infobox]]](https://en.wikipedia.org/wiki/Template:Infobox) could affect over two million pages and have a large immediate effect on performance. We would want to carefully verify first that balance would not affect the appearance of any of those pages, using visual diff or other tools.

Related: T89331: Replace HTML4 Tidy in MW parser with an equivalent HTML5 based tool, T114072: <section> tags for MediaWiki sections.

Mailing list discussion: https://lists.wikimedia.org/pipermail/wikitech-l/2015-October/083449.html

CURRENT STATUS (2019-07-19): the parsing team's current roadmap postpones implementation of this feature until after the Parsoid/core parser integration is done. However, since the core parser uses a DOM-based tidy now (Remex), it could feasibly be done during tidy in the same way in both the legacy parser and Parsoid.

Details

Related Gerrit Patches:

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

He7d3r added a subscriber: He7d3r.Feb 1 2016, 8:00 PM
Qgil removed a subscriber: Qgil.Feb 11 2016, 12:26 PM
RobLa-WMF triaged this task as Medium priority.
RobLa-WMF added a subscriber: RobLa-WMF.

Per E146

DStrine moved this task from Request IRC meeting to Under discussion on the TechCom-RFC board.
ssastry added a comment.EditedMar 19 2016, 2:13 AM

Here is an updated proposal for "balanced templates" based on revisiting some old notes, discussion here, and @cscott's attempts to prototype the current proposal.

TL:DR;

In the current proposal as outlined in previous comments, we are trying to modify the HTML at the transclusion site so that it is possible to drop the template output as is. As @cscott is discovering with his attempts to prototype this, this can be fairly complex (and can also be hard for editors to reason about). However, instead of grappling with the complexity of the HTML5 tree building algorithm and its idiosyncrasies or HTML5's content model constraints (and the mismatches between the two which are a source of confusion), I am proposing that we identify constraints on template output and constraints on the use-site that template authors can specify. The parser then enforces both these constraints by suitably modifying the HTML output of the template and the HTML at the use site. Details below.

Detailed proposal

In this RFC, we are interested in two properties of template output: its well-formedness (balanced, well-nested HTML tags) and HTML5 content-model constraints at the transclusion site (constraints on whether the output can be introduced at the transclusion site as is). These properties affect:

  • editors' ability to reason about template output
  • their editability in HTML editors (like VE)
  • whether templates lend themselves to incremental parsing solutions

Well-formedness is easy to guarantee by building a DOM fragment from the output (string) and reserializing it. This fixes all mismatched tags, bad nesting, etc. However, if we want to localize and bound the impacts of embedding this DOM fragment within the surrounding context, we have two knobs to work with.

  1. Output constraints on the template output: These are enforceable constraints on the output of a transclusions. For example, if a template declares that it produces "block" output, we can look at the DOM fragment of transclusions that use it and wrap that DOM fragment in a <div> if its output doesn't satisfy this constraint.
  2. Use-site constraints on the surrounding context: These are constraints on where a transclusions can be embedded. For example, if a template declares it shouldn't be used inside <a> tags, any surrounding <a> tags are closed before embedding the DOM fragment.

So, the primary difference from the existing proposal is that instead of making heroic efforts in the parser to satisfy HTML5 content-model constraints, we instead let template authors specify constraints on a template output and the use site for its transclusions. This of course means that when the template output is introduced at the use site, despite these constraints, there might be non-local effects. However, I think this is acceptable. There will be a subset of templates where incremental parsing, improved reasoning, and improved editability benefits will not be available. But, with this approach, we can gradually improve the set of supported output and use-site constraints, and their complexity. So, the expectation is that over time, this subset will diminish.

Given a template with its output constraints and use-site constraints, here is how a wikitext parser might use them.

  • The HTML string is parsed to a DOM fragment which ensures that its output is well-formed.
  • Any declared output constraints are enforced. Ex. a block-output constraint will wrap the entire output in a <div> (with a special class, if necessary). Or, a no-links-output constraint will cause all <a> tags to be stripped from the output. This may be because the template author intends for the template to be used within a link.
  • Any declared use-site constraints are enforced. For example, if a template declares that it cannot be used in <a> context, any surrounding <a> tags are closed before the DOM fragment is embedded.

Some notes about constraints:

  • Templates can provide declarations about, none, one, or both of these constraints.
  • Some use-site constraints could be derived from the output constraint. For example, if a template declares an output constraint of block tags, we could decide to enforce that it cannot be used inside <p> tags or <h> tags.
  • Some use-site constraints could be dervied from the output. If a template generates output that has an <a>, <p>, or <h*> tag, we could automatically add use-site constraints that closes any surrounding tags of those types.

Here are some benefits of this approach:

  • It keeps the parser end of the bargain manageable. There is very little additional complexity here. This technique is fairly simple and relies solely on a HTML5 parsing library / service for enforcing well-formedness. Enforcing template-author-declared constraints eliminates guesswork and complexity from the implementation as well.
  • An editor can look at the template documentation and figure out fairly easily and clearly where and how the template is meant to be used. There are going to be fewer surprises in terms of how rendering is affected by non-local effects of transclusions.
  • A HTML editor like VE is very well-placed in terms of enforcing use-site constraints, i.e. if a template declares that it should not used in links, VE might prevent it from being used in a link. Because of this, it can provide stronger WYSIWYG guarantees that when the edited HTML is saved to wikitext, there are going to be fewer surprises about changes to rendering compared to how it showed up in the editor in a VE session.
  • Incremental parsability is also improved. Note that the two constraints by themselves are insufficient to guarantee that when the output of transclusion changes (either because the parameters to the translusion were changed, or because the template source itself was changed), we can take the new DOM fragment and install it in place of the old DOM fragment in the original HTML. However, in some constraint scenarios, we can make very reliable guarantees about this drop-in replacement of a transclusion's output.

    For example, with block level constraints, even if the template output moves around (for example, it got fostered out of a table), we know that since we are guaranteed that the edited output will still be block-level output, we can replace a <div>..</div> with another <div>..</div>, a <table> with another <table>, etc. Additionally, if we were enforcing use-site constraints of not being used inside a p-tag, we can even replace a <div>..</div> with a <p>..</p> and so on.

    Rather than try to guarantee incremental parsability in all cases upfront, we can build up this capability in the corpus of templates gradually by coming up with a sane set of workable output and use-site constraints and have templates opt into these over time.

    If a template edit changes its output or use-site constraints, then incremental parsing might have to be disabled for that edit. The pages using that template will now incur a full parse penalty. Later edits will re-enable incremental parsing.

    Note that this incremental parsing ability is only achievable in Parsoid since Parsoid maintains a mapping between DOM-fragments and wikitext offsets. So, on edits to a template, it can parse the old HTML, find the DOM fragments corresponding to the transclusion of that template, and replace it with the updated DOM fragment and serialize the DOM back to HTML. This feature cannot and will not be provided in the core PHP parser.

Questions to resolve

So, here are some things that need to be resolved / discussed:

  1. Feedback about this approach in general. Does this seem like an improved and viable approach?
  1. What are the best ways to prototype this? It seems that we could start with just one output and use-site constraint each. For example, we could use block-output (block in the HTML4 sense since that is easier to grok) (#balance:block as used in T114445#1789708 ) The use-site constraint could be no-p, no-h*, no-a, i.e. this template cannot be used inside <p>, <h*>, and <a> tags. We should come up with a better way to specify this.

    We need to pick a set of templates on which we could declare this output and this use-site constraint. Infoboxes seems like good candidates.
  1. Come up with a simple taxonomy / terminology / mechanism for making these output and use-site constraints. We have considered link, table, list, etc. in T114445#1789708 Anyway, we need to enumerate constraint types and write up specifications for them.
  1. Figure out where these constraints will be specified. Options are:
    • template source via magic words, parser function syntax, something else.
    • templatedata: this seems a good place for this, but template source and its constraints would now be in different places.
  1. All along, we have been very strongly leaning towards an opt-in model for templates. As far as I can tell, opt-in is the only approach that makes sense with this updated proposal.

I don't think this is actually a simplification. As noted in my prototype, the hard part here is actually determining what the "use site" of the template is. That essentially requires a full HTML5 tree builder pass. Once you've precisely identified the use site, all of the fixup strategies are essentially the same. Exposing a full use-site constraint mechanism to the user is likely to make use of templates unwieldy. As noted in my proposal above, I think block/inline/table is probably about the most this should be exposed to the user, and I expect that the first prototypes will only include the block mode.

ssastry added a comment.EditedMar 19 2016, 2:23 AM

I don't think this is actually a simplification. As noted in my prototype, the hard part here is actually determining what the "use site" of the template is. That essentially requires a full HTML5 tree builder pass. Once you've precisely identified the use site, all of the fixup strategies are essentially the same. Exposing a full use-site constraint mechanism to the user is likely to make use of templates unwieldy. As noted in my proposal above, I think block/inline/table is probably about the most this should be exposed to the user, and I expect that the first prototypes will only include the block mode.

The simplification is because you don't need to infer anything automatically. For example, the template author might specify that for infoboxes, you just need to ensure it is not inside a p-tag and that the output has to be forced to be a block tag. That eliminates the complexity of determining how to embed the template output. You just continue to use it as it has been done all along so far *after* enforcing template-author specified constraints.

Change 279670 had a related patch set uploaded (by Cscott):
WIP: Add {{#balance}} to opt-in to balanced templates

https://gerrit.wikimedia.org/r/279670

RobLa-WMF mentioned this in Unknown Object (Event).Apr 13 2016, 6:54 PM
RobLa-WMF mentioned this in Unknown Object (Event).Apr 13 2016, 7:34 PM
cscott updated the task description. (Show Details)Apr 13 2016, 7:45 PM
cscott updated the task description. (Show Details)Apr 13 2016, 8:10 PM
cscott updated the task description. (Show Details)Apr 13 2016, 8:29 PM

Updated the RFC to match the current proposed semantics and implementation.

Excuse me if I missed something in the proposal, but I'd like to raise the question of template parameters. Currently, template parameters are wikitext, and can thus contain (unbalanced) HTML tags. How should parameters be treated in balanced templates? Should each parameter be pre-parsed on it's own? Or sanitized? Or do we allow plain text parameters only? Or limited wiki syntax? Structured data?...

Allowing un-balanced wikitext parameters to be used in a balanced template can break it, or at least lead to undesired results.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 3:11 PM
Bonvol added a subscriber: Bonvol.Jun 24 2016, 4:02 PM

Change 303431 had a related patch set uploaded (by Cscott):
WIP: Extend 'format' spec to include format strings.

https://gerrit.wikimedia.org/r/303431

cscott updated the task description. (Show Details)Oct 12 2016, 10:13 PM
cscott updated the task description. (Show Details)Oct 12 2016, 11:00 PM
jeblad added a comment.Jun 2 2017, 2:43 PM

Is there any progress?

ssastry changed the task status from Open to Stalled.Jun 7 2017, 7:21 PM

Sorry, we are pretty overcommitted and this is currently stalled till we finish up some ongoing projects.

Someone asked for a logo.

Balanced templates. Gettit?

Or the minimalist version:

{{===}}
jeblad removed a subscriber: jeblad.Aug 25 2017, 10:11 PM
Izno added a subscriber: Izno.Aug 8 2018, 2:45 AM
Krinkle moved this task from Under discussion to Backlog on the TechCom-RFC board.Mar 20 2019, 7:18 PM
Krinkle added a subscriber: Krinkle.

Moving to backlog as current status is unclear.

If the RFC has a clear desired outcome or problem statement, and resourcing commitment from a team that is interested in wider feedback, input or approval, then move it to the Inbox to let TechCom know :)

Moving to backlog as current status is unclear.
If the RFC has a clear desired outcome or problem statement, and resourcing commitment from a team that is interested in wider feedback, input or approval, then move it to the Inbox to let TechCom know :)

Ya, this is stalled because we don't want to do this in both parsers. But, yes we'll flag this once we are ready to pick this up again.

Alsee added a subscriber: Alsee.Apr 21 2019, 9:51 AM
Retro added a subscriber: Retro.May 8 2019, 6:35 PM
cscott updated the task description. (Show Details)Jul 19 2019, 9:31 AM

Note that the core parser has a DOM-based tidy now (Remex) and so this is more feasible to implement in the legacy parser than previously. Our current roadmap still postpones this until after Parsoid replaces the legacy parser, but it would be possible (for instance) to start recognizing the {{#balance}} parser function and emitting linter warnings from remex during tidy, as a first step.

In the recent Tech Conf there was a session about onwiki tooling, which includes templates: T234661.

The topic of balanced templates came up in these discussions a few times as a thing that may help address the concerns that some people have about the performance that will be caused by making templates global.

Is this true? Will moving templates in the direction of being more balanced allow more stable and better performing cross-wiki transclusion?

... Also, how is this related to Scribunto modules? Is there a plan to mark them as balanced? It looks like a consensus is forming that before templates are fully global, it's a good idea to make modules global first. Can modules go global without implementing this Balanced templates RFC, or should the balancing be done first?

The topic of balanced templates came up in these discussions a few times as a thing that may help address the concerns that some people have about the performance that will be caused by making templates global.

I see three things that are enabled by balanced templates:

  • Improving performance of re-parses when templates change. This is related to global templates only in so far as global templates could potentially mean that individual templates are used on more pages.
  • Parsing templates in a context different from the context of the local page. Balanced templates are a precondition to that, but quite a bit more work would be needed. This is where I brought up balanced templates in our conversation, but if I recall correctly, you really want the opposite - evaluation in the local context.
  • Visual editing of the template, as well as rendering of pages for editing, without having to evaluate the templates it contains. This would be very nice to have for e.g. offline editing, but I see no connection to global templates.

It seems to be like balanced templates and global templates touch on some related topics, but don't directly impact each other.

But I might be missing something, I'd be interested to hear @cscott's take.

The topic of balanced templates came up in these discussions a few times as a thing that may help address the concerns that some people have about the performance that will be caused by making templates global.

I see three things that are enabled by balanced templates:

  • Improving performance of re-parses when templates change. This is related to global templates only in so far as global templates could potentially mean that individual templates are used on more pages.
  • Parsing templates in a context different from the context of the local page. Balanced templates are a precondition to that, but quite a bit more work would be needed. This is where I brought up balanced templates in our conversation, but if I recall correctly, you really want the opposite - evaluation in the local context.
  • Visual editing of the template, as well as rendering of pages for editing, without having to evaluate the templates it contains. This would be very nice to have for e.g. offline editing, but I see no connection to global templates.

It seems to be like balanced templates and global templates touch on some related topics, but don't directly impact each other.
But I might be missing something, I'd be interested to hear @cscott's take.

All of that. But, to clarify your point #2 which hints at this, the important piece here is the decoupling of parsing of templates from the page that contains them. Balanced templates are a necessary but not sufficient condition to enable that. But, the decoupling means you can memoize a template's parsed output (all the way to HTML) globally across wikis. That of course requires us to be able to track usage of certain kinds of functionality that prevents that kind of memoization ( time-dependent functionality, any database state like revids, page ids, random numbers, etc.). But I believe that kind of state tracking support already exists in MediaWiki.

All of that. But, to clarify your point #2 which hints at this, the important piece here is the decoupling of parsing of templates from the page that contains them. Balanced templates are a necessary but not sufficient condition to enable that. But, the decoupling means you can memoize a template's parsed output (all the way to HTML) globally across wikis. That of course requires us to be able to track usage of certain kinds of functionality that prevents that kind of memoization ( time-dependent functionality, any database state like revids, page ids, random numbers, etc.). But I believe that kind of state tracking support already exists in MediaWiki.

At the level of the template, there's a flag on PPFrame for things like Cite's <references> to indicate that the output of the template itself depends on something external. But it doesn't look like that's used for things like time-dependent functionality, rev ID, and so on.

At the level of the full page parse, MediaWiki tracks time-dependent functions, access to rev IDs, and so on. But it assumes that those things are constant within the process of any one parse so it doesn't track it at the template level.

There's probably some stuff that's not tracked at all but maybe should be. For example, Scribunto calls math.randomseed( 1 ) for every top-level #invoke so math.random should usually be the same every time, but there are probably ways to get nondeterministic behavior (Reseed with os.clock()? Nested #invokes?).

Also of note is that we'll probably have to reimplement a lot of that tracking in Parsoid/PHP when we get to the point of replacing Parser.php. ;)

tstarling removed tstarling as the assignee of this task.Nov 20 2019, 10:56 PM
Krinkle removed a subscriber: Krinkle.