Page MenuHomePhabricator

Behavior switch/magic word uniformity
Open, Needs TriagePublic

Description

We are adding a reliable argument quoting mechanism to {{...}} template syntax (T114432). For the sake of uniformity, it would be nice if all "macro insertion" syntactic constructs could be thought of as syntactic sugar for some invocation using the {{...}} syntax. This would reduce the number of strange and tricky argument-escape mechanisms needed: if you ever need to invoke a parser function/behavior switch/magic word/template/etc using reliable argument quoting (or need to serialize an edited instance of the same), then you can fall back to the {{...}} form. Some of these might be used infrequently enough that we can just deprecate the old forms; others are useful enough that they should remain as syntactic sugar.

Other tasks cover:

This task would be for the remaining cases, magic words and behavior switches:

  • Behavior switches:
    • __NOTOC__ => {{#notoc}} or {{#config|notoc}} or {{#set:notoc}} ?
  • Variables: w/o arguments (probably low priority to migrate):
    • {{CURRENTYEAR}} => {{#currentyear}} ? ({{#var...}} is already taken...)
    • Many of these have "optional arguments" and so fall under either the "variables with arguments" or "parser functions" categories.
  • Variables with arguments:
    • {{PROTECTIONLEVEL:action}} => {{#protectionlevel|action}}
    • {{PAGENAME:Template:Main Page}} => {{#pagename|Template:Main Page}}
    • (These are already parser functions, they just need a hash-prefixed form.)
  • Arguments and side effects:
    • {{DISPLAYTITLE:title}} => {{#displaytitle|title}}
    • {{DEFAULTSORT:sortkey|noerror}} => {{#defaultsort|sortkey|noerror}}
  • Parser functions:
    • {{PAGEID: page name }} => {{#pageid|page name}}
    • {{urlencode:string|WIKI}} => {{#urlencode|string|WIKI}}
    • {{padleft:xyz|strlen|char}} => {{#padleft|xyz|strlen|char}}
    • {{int:message name}} => {{#int|message name}}
    • {{:Title}} => {{#expand|Title}}
      • Not sure where in the codebase this is actually handled?
    • {{raw:Title}} => {{#raw|Title}} (also msg and msgnw)
      • These are special cased in the parser; probably need a closer look

Note that the use of a colon to separate the first argument is especially problematic (see T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar). Also, from a readability perspective it would be nice if all of these used the leading # character to indicate to a human reader that these are special functions, not ordinary templates---but if we're feeling lazy those forms which don't take arguments and already use {{...}} syntax (albeit without a leading #) could probably be left alone without too much harm.

The goal being to allow all these weird special cases to use standard template/argument syntax, with reliable quoting for arguments, instead of having a weird collection of ad hoc mechanisms for allowing arguments. For example, this would Just Work:

{{#urlencode|<<<
some
weird [[ string ]]
with | and = characters
>>}}

Related Objects

Event Timeline

If {{#var:...}} is an actual suggestion for a parser function name (i.e. a parser function named the exact string #var, instead of #var just being a stand-in for some to-be-determined string), keep in mind tht this name is already used by Extension:Variables. Some of the other suggested names might similarly conflict with other extensions, though I don't know of any off-hand.

@Dinoguy1000 thanks! Yeah, this is still a WIP, I'm floating it now especially to get feedback on what appropriate names might be. These names can be/are localized, too, so that's got to be considered when thinking of new names.

Note that double-underscore magic words are automatically added as page properties (in the page_props table). You should be sure not to lose that behavior if someone uses your alternative syntax. Considering that they can't structurally have arguments, the justification for changing them given here ("with reliable quoting for arguments, instead of having a weird collection of ad hoc mechanisms for allowing arguments") doesn't seem to apply.

Rather than changing wikitext syntax for cases like {{DISPLAYTITLE:title}} => {{#displaytitle|title}} where the only real difference is the colon separating the name from the first argument, you might find it an easier sell to simply extend that construct with your reliable quoting for arguments too.

At least in the PHP parser, all the constructs delimited with {{ }}, {{{ }}}, [[ ]], and -{ }- share the same parsing code, so not extending the new quoting to them all might actually take more work than doing so (if only management of a flag to indicate when it should/shouldn't). The only edge case there is the handling of that first "argument" after the colon.

At least in the PHP parser, all the constructs delimited with {{ }}, {{{ }}}, [[ ]], and -{ }- share the same parsing code, so not extending the new quoting to them all might actually take more work than doing so (if only management of a flag to indicate when it should/shouldn't). The only edge case there is the handling of that first "argument" after the colon.

Also complicating that is the fact that the parser doesn't know whether {{{{foo| is going to be interpreted as {{ followed by {{foo| or { followed by {{{foo| until it gets to the matching }} or }}}, at which point any pipe-separated arguments have already been parsed.

Also at the stage where it handles parsing the arguments from the wikitext it doesn't know whether {{foo:bar is going to be referring to a variable or parser function named "foo" (with "bar" as the parameter) versus Template:Foo:bar versus page 'bar' in namespace Foo (versus, maybe, interwiki transclusion from site 'foo').

Note that double-underscore magic words are automatically added as page properties (in the page_props table). You should be sure not to lose that behavior if someone uses your alternative syntax. Considering that they can't structurally have arguments, the justification for changing them given here ("with reliable quoting for arguments, instead of having a weird collection of ad hoc mechanisms for allowing arguments") doesn't seem to apply.

Yeah, the argument for adding alternatives for behavior switches is not as strong. It would mean that you could express every page with a simplified wikitext syntax which just used the {{#...}} forms, which could be interesting if you shipped wikitext to a client and did client-side parsing, for example. But that's a different justification than given for the other cases, and more of a reach. I think we'd also entertained thoughts of removing these behavior switches entirely and just setting the page props directly, maybe using MCR, the way we yanked interlanguage links out of the wikitext. So maybe the problem solves itself that way.

Rather than changing wikitext syntax for cases like {{DISPLAYTITLE:title}} => {{#displaytitle|title}} where the only real difference is the colon separating the name from the first argument, you might find it an easier sell to simply extend that construct with your reliable quoting for arguments too.

At least in the PHP parser, all the constructs delimited with {{ }}, {{{ }}}, [[ ]], and -{ }- share the same parsing code, so not extending the new quoting to them all might actually take more work than doing so (if only management of a flag to indicate when it should/shouldn't). The only edge case there is the handling of that first "argument" after the colon.

It's an interesting idea, but I'd prefer not to support {{foo:<<<....>>>}}, at least initially. You'd be more likely to convince me that {{foo|<<<...>>>}} bar is reasonable, but then you run into some of the ambiguities with a page named foo. I feel like as a solid basis for the future we'd be better off trying to standardize these forms, using a hash consistently to indicate to humans that these are special functions, and using the improved mechanisms of the heredoc forms as the carrot to motivate conversion. (And it sets the stage for defining a simplified wikitext which could be shipped client-side in the future, etc.)

Also at the stage where it handles parsing the arguments from the wikitext it doesn't know whether {{foo:bar is going to be referring to a variable or parser function named "foo" (with "bar" as the parameter) versus Template:Foo:bar versus page 'bar' in namespace Foo (versus, maybe, interwiki transclusion from site 'foo').

Well, this is one reason why {{#foo|bar}} is preferable to {{foo:bar}}, right? It gives the parser a good hint, and it avoids its having to guess whether the thing after the colon is actually an argument or not.

I think we'd also entertained thoughts of removing these behavior switches entirely and just setting the page props directly, maybe using MCR, the way we yanked interlanguage links out of the wikitext.

That won't work for __TOC__, since that one specifies that the table of contents be inserted at a specific position in the page.

It looks like it should work for all the rest, at least in core.

It's an interesting idea, but I'd prefer not to support {{foo:<<<....>>>}}, at least initially.

I don't think that specific case would be possible without a lot of hackery, but {{foo:bar|<<<....>>>}} would.

using a hash consistently to indicate to humans that these are special functions,

To a lot of humans there's not really much "special" about them versus normal templates, really. Enwiki has equivalent template versions of several of them to cater to humans who can't remember that it's {{DEFAULTSORT:Foo}} rather than {{DEFAULTSORT|Foo}}.

Personally, I don't know if there's actually a pattern to which ones use the hash and which don't.

and using the improved mechanisms of the heredoc forms as the carrot to motivate conversion. (And it sets the stage for defining a simplified wikitext which could be shipped client-side in the future, etc.)

I'm not sure that "conversion" is actually a worthwhile goal. Especially {{:Title}} => {{#expand|Title}} seems pretty pointless, and the former syntax is already getting your heredoc form.

I'm cheating and squeezing in other parts of my Evil Master Plan with that {{#expand}} thing. The goal there is to refactor the template expansion away from the preprocessor, which would just be a dispatcher. Of course I can do that without actually making {{#expand}} callable by mere mortals... but why not?

And *completely* uniform syntax is really T204375: Wikitext 2.0 as low-bandwidth transport for client-side rendering, not strictly necessary for sane argument quoting, as we discussed above. I should probably split the task description to make it clear where the division between "useful" and "only important for evil master plan" is.

The preprocessor (at least the PHP one) already dispatches template transclusions.

@Anomie The refactoring I am proposing would just make that distinction clearer: the "dispatcher" component would have zero wikitext specifics baked into it, so the "parse the wikitext looking for template invocations" code would be completely separate from the "register a namespace and dispatch invocations with arguments" code.

Parsoid doesn't have a separate preprocessor; it parses the template invocations at the same time it parses the rest of the wikitext syntax. But it would still need to plug in to the dispatch logic.

It doesn't seem that way to me. It seems like you're adding a pre-preprocessor that transforms the wikitext into some simplified pseudo-wikitext so you can have the preprocessor handle pseudo-wikitext instead. I don't see what that really gains us besides your hope that the "some simplified pseudo-wikitext" could be used by your T204375 scheme, at the cost of another pass and whatever extra time that might take.

The preprocessor refactoring is a distraction, and it's not needed for this task. (And the dispatching actually occurs in Parser::doBraceExpansion / Parser::doDoubleUndercoreExpanion / etc, not the preprocessor.)

  • Variables: w/o arguments (probably low priority to migrate):
    • {{CURRENTYEAR}} => {{#currentyear}} ? ({{#var...}} is already taken...)

What about {{#env: ... }}, e.g.:

Old NameNew Name
{{CURRENTYEAR}}{{#env:currentyear}}
{{CURRENTMONTH}}{{#env:currentmonth}}
{{CURRENTMONTHNAME}}{{#env:currentmonthname}}
{{CURRENTDAY}}{{#env:currentday}}

etc.


I’m suggesting {{#env: ... }} because it’s a shortening of “environment variable”, and matches what CSS does:

It would also ensure that new environment variables won’t conflict with templates or extension parser functions.

It occurs to me that [[Category:Foo]] is another special case of "metadata-altering wikitext", and one might consider using {{#category|Foo|sortkey}} as a uniform syntax for this.

I am not happy with the viewpoint suggested by {{#env:}} or {{#var:}}.

All these are functions, and they are retrieving the current state by calling that parser function.

The very old view has been that parser functions are constants, who have one and only one value. Or they are plain variables to be set. This is obviously incorrect. Just {{!}} is a constant.

E.g., {{REVISIONUSER}} is a {{#env:}} or {{#var:}} according to the recent proposal. However, {{REVISIONUSER:Main Page}} is a function call with a parameter. Quite confusing.

Almost every “constant” may be equipped with a parameter, becoming an obvious function call. {{CURRENTYEAR}} appears to be a constant value, 2022 for a while. However, one day this might be extended by a parameter indicating Hebrew or Islamic or Chinese calendar system. {{CURRENTMONTHNAME}} is supposed to appear in project (or page???) language, but might get a parameter for language, and also Hebrew or Islamic or Chinese calendar system.

Anyway, {{#currentyear}} is fine with me, and I would be glad to migrate current programming towards this equivalent.

Change 819148 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Unify the "magic variable" and "parser function" form of several built-ins

https://gerrit.wikimedia.org/r/819148

Change 819167 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] CoreMagicVariables/CoreParserFunctions: unify revisiontimestamp & etc

https://gerrit.wikimedia.org/r/819167

Change 819199 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] CoreMagicVariables/CoreParserFunction: unify revisionuser

https://gerrit.wikimedia.org/r/819199

Change 819200 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] CoreMagicVariables/CoreParserFunction: unify revisionid

https://gerrit.wikimedia.org/r/819200

Change 819148 merged by jenkins-bot:

[mediawiki/core@master] Unify the "magic variable" and "parser function" form of several built-ins

https://gerrit.wikimedia.org/r/819148

Change 833451 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Unify no-arg and 1-arg forms of {{REVISIONTIMESTAMP}} and friends

https://gerrit.wikimedia.org/r/833451

Change 819167 merged by jenkins-bot:

[mediawiki/core@master] CoreMagicVariables/CoreParserFunctions: unify revisiontimestamp & etc

https://gerrit.wikimedia.org/r/819167

Change 833451 merged by jenkins-bot:

[mediawiki/core@master] Unify no-arg and 1-arg forms of {{REVISIONTIMESTAMP}} and friends

https://gerrit.wikimedia.org/r/833451

Change 819199 merged by jenkins-bot:

[mediawiki/core@master] CoreMagicVariables/CoreParserFunction: unify revisionuser

https://gerrit.wikimedia.org/r/819199

Change 819200 merged by jenkins-bot:

[mediawiki/core@master] CoreMagicVariables/CoreParserFunction: unify revisionid

https://gerrit.wikimedia.org/r/819200