Page MenuHomePhabricator

Serializing extension tags using TemplateData
Open, Needs TriagePublic

Description

The tag-style syntax:

<ext foo="bar">
baz
</ext>

can be mapped to template-style syntax:

{{#tag:ext|baz|foo=bar}}

This mapping plays very nicely with heredoc arguments (T114432):

{{#tag:ext|<<<
baz
>>>|foo=<<<
bar
>>>}}

...which allow the body of the extension tag to contain arbitrary text, but with improved escape mechanisms -- extension tags can be nested, for example.

Given this mapping, it would be useful for TemplateData to apply to extension tags as well, so that if, for example, the caption attribute of the <gallery> extension were parsed as full wikitext (instead of some strange almost-wikitext) it could be described as:

<templatedata>
{
	"description": "Gallery extension",
	"params": {
		"1": {
			"label": "Gallery body",
			"type": "string",
			"required": true
		},
		"caption": {
			"label": "Caption",
			"type": "content",
			"description": "The caption for the entire gallery"
		}
	}
}
</templatedata>

This would also guide/simplify serialization. For simple values of attributes (no newlines, no quotes), we'd use the <ext attr="..."> syntax for the extension, but as soon as the attribute value got "complicated" we could switch to the {{#tag:ext||attr=<<<...>>>}} syntax.


In theory this might also encourage tags like <gallery> to move more of their parameters into attributes, instead of writing their own bespoke parsers of the freetext body of the tag. So optionally instead of:

<gallery>
File:Detroit Publishing Co. - A Yeoman of the Guard (N.B. actually a Yeoman Warder), full restoration.jpg|1
File:Official_program_-_Woman_suffrage_procession_March_3,_1913_-_crop.jpg|2
File:Thurston, the famous magician - East Indian Rope Trick.jpg|3
File:Joseph Ferdinand Keppler - The Pirate Publisher - Puck Magazine - Restoration by Adam Cuerden.jpg|4
</gallery>

we might eventually see:

{{#tag:gallery
|<<<File:Detroit Publishing Co. - A Yeoman of the Guard (N.B. actually a Yeoman Warder), full restoration.jpg|1>>>
|<<<File:Official_program_-_Woman_suffrage_procession_March_3,_1913_-_crop.jpg|2>>>
|<<<File:Thurston, the famous magician - East Indian Rope Trick.jpg|3>>>
|<<<File:Joseph Ferdinand Keppler - The Pirate Publisher - Puck Magazine - Restoration by Adam Cuerden.jpg|4>>>
}}

...or ultimately something similar which let you specify the filename as TemplateData type wiki-file-name and the caption as TemplateData type content. But that would require varargs support in TemplateData.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Yeah. I guess this part of the bug could (eventually) be made more specific to "serializing extensions using TemplateData (and heredoc syntax)".

I hadn't thought too hard about how the extension might provide the TemplateData. Might be done in roughly the same way messages are provided, so the extension provides a default templatedata object which can then be over-ridden on a per-wiki basis as needed (most likely to further localize interface strings)? But that's probably part of T54607, not this bug.

This task was more specifically an answer to the question posed in T187958: Parsoid and PHP parser parse <gallery caption="…"> differently regarding whether or not attributes included as part of the extension tag syntax could/should include fully-general wikitext. My initial intuition was that this was an abomination, because it would introduce Yet Another Crazy Wikitext Subset With Strange Escape Rules -- what if a template included a double-quote? do we really want folks including newlines in attribute values? what's the start-of-line context looks like? etc -- but the idea that we could (conceptually) map the <tag> extension syntax back to template-style syntax with heredoc arguments for sane escaping made this seem like a reasonable idea after all. If extension attributes are just template parameters, then this means we don't have to define a special TemplateData type for "wikitext, but not too crazy"; we just use "type":"content". And we don't have to worry about how to serialize crazy attribute values, when confronted with the apocalypse we'll just fall back to serializing using the {{#tag}} syntax, which we already know how to escape.

cscott renamed this task from TemplateData for extension tags to Serializing extension tags using TemplateData.Sep 13 2018, 10:48 PM
cscott updated the task description. (Show Details)

Quick note (which maybe belongs better with T90914: Provide semantic wiki-configurable styles for media display) that ideally we could unify media layout options with templatedata as well; something like:

{{#media|File:Foobar.jpg|caption=baz}}

would be the desugaring of [[File:Foobar.jpg|baz]]. This would allow more interesting wikitext to be easily embedded in captions as well.

we might eventually see:

{{#tag:gallery
|<<<File:Detroit Publishing Co. - A Yeoman of the Guard (N.B. actually a Yeoman Warder), full restoration.jpg|1>>>
|<<<File:Official_program_-_Woman_suffrage_procession_March_3,_1913_-_crop.jpg|2>>>
|<<<File:Thurston, the famous magician - East Indian Rope Trick.jpg|3>>>
|<<<File:Joseph Ferdinand Keppler - The Pirate Publisher - Puck Magazine - Restoration by Adam Cuerden.jpg|4>>>
}}

Personally, I don't see that as being better. I find it more confusing, not less.

Quick note (which maybe belongs better with T90914: Provide semantic wiki-configurable styles for media display) that ideally we could unify media layout options with templatedata as well; something like:

{{#media|File:Foobar.jpg|caption=baz}}

would be the desugaring of [[File:Foobar.jpg|baz]]. This would allow more interesting wikitext to be easily embedded in captions as well.

While the existing [[File:Foobar.jpg|baz]] syntax is pretty bad with the multitude of magic parameters, I don't think turning every file usage into a "#media" parser function is much of an improvement for the common use case, and we'd likely need to keep the existing syntax more or less the same forever for existing users.

Another option would be to rethink of file links as being like template transclusions that use square brackets instead of curly brackets for historical reasons:

[[File:Example.svg|thumb|left|<<<
Fancy wikitext here
>>>]]

This would also guide/simplify serialization. For simple values of attributes (no newlines, no quotes), we'd use the <ext attr="..."> syntax for the extension, but as soon as the attribute value got "complicated" we could switch to the {{#tag:ext|attr=<<<...>>>}} syntax.

Note that the two are not necessarily equivalent.

With <ext attr="{{bar}}">, the tag hook function gets "{{bar}}" as the value for 'attr'. With {{#tag:ext|attr={{bar}}}} on the other hand, it gets the value with Template:bar expanded.

Whether {{#tag:ext|attr=<<<{{bar}}>>>}} expands the template or not is what's being discussed at T114432#4574137.

we might eventually see:

{{#tag:gallery
|<<<File:Detroit Publishing Co. - A Yeoman of the Guard (N.B. actually a Yeoman Warder), full restoration.jpg|1>>>
|<<<File:Official_program_-_Woman_suffrage_procession_March_3,_1913_-_crop.jpg|2>>>
|<<<File:Thurston, the famous magician - East Indian Rope Trick.jpg|3>>>
|<<<File:Joseph Ferdinand Keppler - The Pirate Publisher - Puck Magazine - Restoration by Adam Cuerden.jpg|4>>>
}}

Personally, I don't see that as being better. I find it more confusing, not less.

@Anomie I agree the gallery syntax isn't great. The big issue there is that we don't have good syntax for varargs. Really what you want to do is file1=....|caption1=<<<....>>>, which is an improvement in that it lets you reliably write "complicated" wikitext captions without having to worry about breaking stuff. But then we have to localize file1, caption1 etc and deal with different localized numeral sets, etc. It's a mess. We should fix that, but that's probably a separate task (EDIT: I created one! T204366: Better varargs for templates). My point is that you could use this syntax buried deep inside a template or scribunto module to get the benefits of predictable escaping, not that we'd want human beings to actually write this regularly.

While the existing [[File:Foobar.jpg|baz]] syntax is pretty bad with the multitude of magic parameters, I don't think turning every file usage into a "#media" parser function is much of an improvement for the common use case, and we'd likely need to keep the existing syntax more or less the same forever for existing users.

Another option would be to rethink of file links as being like template transclusions that use square brackets instead of curly brackets for historical reasons:

[[File:Example.svg|thumb|left|<<<
Fancy wikitext here
>>>]]

I'm trying to *avoid* adding heredocs to square bracket syntax, but I don't plan on deprecating square bracket syntax. I'd like to think of it as just sugar for the "long" {{#media|...}} form. Use it for conciseness in the common case and whereever possible, but as soon as you start to want to do "complicated" things in the caption or attributes, it's time to switch to the reliable quoting mechanism of {{#media....}}. I feel like I'm most likely to try to sneak in a better style mechanism at the same time I define #media, so further discussion of that will probably occur in T90914.

This would also guide/simplify serialization. For simple values of attributes (no newlines, no quotes), we'd use the <ext attr="..."> syntax for the extension, but as soon as the attribute value got "complicated" we could switch to the {{#tag:ext|attr=<<<...>>>}} syntax.

Note that the two are not necessarily equivalent.

With <ext attr="{{bar}}">, the tag hook function gets "{{bar}}" as the value for 'attr'. With {{#tag:ext|attr={{bar}}}} on the other hand, it gets the value with Template:bar expanded.

Whether {{#tag:ext|attr=<<<{{bar}}>>>}} expands the template or not is what's being discussed at T114432#4574137.

You're totally correct about the expansion differences; I think it's important for the use of {{#tag:pre}} (for example) that the parser function (optionally, at least) can get the unexpanded argument. Discussion on that can continue in T114432.

I'm trying to *avoid* adding heredocs to square bracket syntax, but I don't plan on deprecating square bracket syntax. I'd like to think of it as just sugar for the "long" {{#media|...}} form. Use it for conciseness in the common case and whereever possible, but as soon as you start to want to do "complicated" things in the caption or attributes, it's time to switch to the reliable quoting mechanism of {{#media....}}. I feel like I'm most likely to try to sneak in a better style mechanism at the same time I define #media, so further discussion of that will probably occur in T90914.

That's a good point and I agree with you there.

The tag-style syntax:

<ext foo="bar">
baz
</ext>

can be mapped to template-style syntax:

{{#tag:ext|foo=bar|baz}}

Actually, the syntax is:

{{#tag:ext|baz|foo=bar}}

Because attributes go after content: mw:Help:Magic words § #tag.


Also, self‑closing tags like:

<ref name="foo" />

need two pipes:

{{#tag:ref||name=foo}}