RFC Meeting: Hygienic transclusions and balanced templates (2016-04-13, #wikimedia-office)

Hosted by daniel on Apr 13 2016, 9:00 PM - 10:00 PM.


See the Architecture meetings page for more general information about this meeting (also: Phab query: list of upcoming RFC meetings, Phab query: list of all RFC meetings).

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

@tstarling suggested in E158 that T130567 followed by T114445 would be good for next week, based on the discussion in the #mediawiki-parsoid channel.

Added to Tech/News as [improving transclusion of templates for Parsoid] and [balanced templates] (a subtask)

RobLa-WMF renamed this event from RFC Meeting: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: Hygienic transclusions and balanced templates (2016-04-13, #wikimedia-office).Apr 6 2016, 10:13 PM
RobLa-WMF updated the event description. (Show Details)
RobLa-WMF mentioned this in Unknown Object (Event).Apr 13 2016, 6:54 PM

(FWIW, I was hoping that I had buried the "hygiene" terminology.)

Transcript links

Meeting started by TimStarling at 21:01:35 UTC.

Meetbot Summary

Meeting ended at 22:02:53 UTC.

People present (lines said)

  • cscott (83)
  • gwicke (59)
  • TimStarling (41)
  • subbu (41)
  • DanielK_WMDE_ (29)
  • robla (8)
  • stashbot (5)
  • wm-labs-meetbot` (3)
  • Krinkle (2)
  • Scott_WUaS (1)
  • YairRand (1)
  • Alsee (1)

Full log from this week:

121:01:35 <TimStarling> #startmeeting E159
221:01:35 <wm-labs-meetbot`> Meeting started Wed Apr 13 21:01:35 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:01:35 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:01:35 <wm-labs-meetbot`> The meeting name has been set to 'e159'
521:01:51 <TimStarling> #topic Hygienic transclusions and balanced templates | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
621:01:57 <cscott> hello hello
721:02:01 <robla> o/
821:02:10 <robla> #link https://phabricator.wikimedia.org/E159
921:02:15 <gwicke> hi
1021:02:30 <robla> #link https://phabricator.wikimedia.org/T130567
1121:03:04 <robla> #info T130567 "WIP RFC: Hygienic transclusions for WYSIWYG, incremental parsing & composition: Options and trade-offs"
1221:03:05 <stashbot> T130567: WIP RFC: Hygienic transclusions for WYSIWYG, incremental parsing & composition: Options and trade-offs - https://phabricator.wikimedia.org/T130567
1321:03:42 <subbu> Related: https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability
1421:03:53 <gwicke> I wrote that RFC with the hope of providing a high-level overview of the problem space, a possible set of requirements, as well as options under discussion
1521:04:33 <gwicke> the problem of making content and transclusions in particular compose well is a very general and important one
1621:05:40 <gwicke> it affects VE's ability to faithfully preview transclusions, parsing performance, as well as the ability to compose content dynamically for specific use cases
1721:06:23 <robla> gwicke: my understanding of things from our conversations is that T130567 describes the general problem, and then T114445 describes one possible solution to the problem described in T130567. Is that right?
1821:06:23 <stashbot> T114445: [RFC] Balanced templates - https://phabricator.wikimedia.org/T114445
1921:06:23 <stashbot> T130567: WIP RFC: Hygienic transclusions for WYSIWYG, incremental parsing & composition: Options and trade-offs - https://phabricator.wikimedia.org/T130567
2021:06:53 <gwicke> robla: yes
2121:07:02 <cscott> more or less
2221:07:16 <cscott> T114445 describes a solution to part of the general issue
2321:07:16 <stashbot> T114445: [RFC] Balanced templates - https://phabricator.wikimedia.org/T114445
2421:07:24 <DanielK_WMDE_> I think one core issue is switchign from transclusion based on wikitext, to transclusion based on a HTML DOM. From transclusion before parsing, to composition after (or during) parsing.
2521:07:35 <cscott> i think gwicke's RFC is intended to be a broader statement, including future directions for tools, etc.
2621:08:04 <DanielK_WMDE_> This would allow us to treat templates, parser functions, magic words, special page transclusions, media inclusion, etc all in the same way.
2721:08:08 <gwicke> the requirements section in https://phabricator.wikimedia.org/T130567 proposes several points that are derived from the use cases we are interested in
2821:08:13 <DanielK_WMDE_> Balanced templates are a precondition to that
2921:08:18 <cscott> for example, a safe DOM-based template mechanism is included in the scope of gwicke's RFC, but T114445 doesn't overhaul templates, it just patches a corner of the existing mechanism.
3021:08:18 <stashbot> T114445: [RFC] Balanced templates - https://phabricator.wikimedia.org/T114445
3121:08:37 <cscott> as DanielK_WMDE_ says
3221:08:58 <subbu> https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#What_are_good_DOM_fragments.3F also summarizes other rfcs related ot dom fragments.
3321:09:13 <gwicke> DanielK_WMDE_: yes, exactly; the requirements try to capture what it means for content to be "modular"
3421:09:24 <cscott> one more example: the new Lua-based infobox stuff will, when used properly, always generate properly balanced output. but it doesn't actually use DOM manipulation under the hood (as I understand it).
3521:09:46 <gwicke> with a specific emphasis on the issues inherent in wikitext as a source format, such as establishing stable transclusion scopes
3621:10:28 <cscott> so the goal of the general RFC is more broadly state: we should be making tools that generate valid DOM. Pasting string fragments together should be increasingly deprecated going forward.
3721:10:44 <gwicke> the scoping issue is at the heart of the discussion around different solutions
3821:11:27 <gwicke> there are several proposals to use new syntax to establish such scopes for new content, either with opt-in or opt-out behavior for the default case
3921:11:53 <DanielK_WMDE_> cscott: btw, i just commented on the balanced template rfc, asking about parameters. Unbalanced wikitext parameters can seriously screw with balanced templates...
4021:12:17 <gwicke> and there is a proposal to investigate establishing those scopes automatically, based on a classification of templates as "unbalanced start template" vs. "normal, balanced"
4121:12:35 <DanielK_WMDE_> gwicke: there are also different hacks to inject/transclude html. The graph extension, for instance.
4221:13:04 <gwicke> DanielK_WMDE_: the good news is that the scoping issue is largely solved for those
4321:13:21 <gwicke> as they tend to be tied to a tag extension, for example
4421:13:26 <subbu> I think extensions for the most part generate DOMs
4521:13:56 <DanielK_WMDE_> gwicke: yes, true. but whatever mechanism we come up with should be flexible enough to accommodate such tag extensions.
4621:14:30 <gwicke> those extensions do share the general issue of enforcing content model constraints with transclusions, but the scope of their content tends to be fixed by syntax
4721:14:33 <TimStarling> validation of every argument in the proposed system would take a long time
4821:14:34 <subbu> the biggest problem with templates is not about the content they generate ... parsing them to html and back will make sure the output is a DOM .. it is about nesting constraints when the DOM fragment is inserted into the context .. and this is a problem for all fragment-producing constructs.
4921:14:36 <DanielK_WMDE_> subbu: quite a few generated wikitext.
5021:14:49 <cscott> DanielK_WMDE_: we'll be discussing the {{#balance}} proposal specifically in the second half hour.
5121:14:49 <TimStarling> there are more arguments than templates
5221:15:44 <subbu> DanielK_WMDE_, what TimStarling said. i think we can just look at the full output of the template and generate a DOM fragment .. it doesn't matter what generated the output as long as the output is a DOM fragment.
5321:16:08 <gwicke> subbu: This goes back to the question of how modular we expect things to be. If we expect nested content to not affect / break surrounding content, then that content needs to be made to conform to constraints.
5421:16:09 <subbu> so, individual parameters and what they are is not very relevant in that sense.
5521:16:31 <cscott> DanielK_WMDE_: I think what you want is Template:BalanceEcho, and then if you are worried about your arguments you can do {{mytemplate|{{BalanceEcho|arg1}}}} etc.
5621:16:51 <subbu> gwicke, right .. i am suggesting that there are 2 issues .. (1) easy: output being a dom fragment (b) hard: how that output is inseted into the page? all the juice is in problem (b) and that is where we shoud focus the discussion.
5721:17:14 <cscott> DanielK_WMDE_: as you'll see, the proper balancing depends on what sort of context you expect for your argument, so it's not something you could necessarily do on a uniform basis for all arguments.
5821:17:20 <DanielK_WMDE_> cscott: i'd actually love to have support for parameters that aren't wikitext, but plain text, or a json/lua structure
5921:17:28 <robla> Gabriel's RFC says the three main approaches currently discussed are "opt-in", "opt-out" and "inference". is that the most important distinction. is making a choice between those three the most important first decision?
6021:17:30 <cscott> DanielK_WMDE_: that's a different RFC of mine.
6121:17:30 <gwicke> subbu: if we decide that the fragment needs to conform, then we are basically done
6221:17:37 <DanielK_WMDE_> cscott: i know ;)
6321:17:42 <gwicke> the issue then becomes a matter of implementation
6421:17:44 <subbu> gwicke, https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#Possible_approaches_for_handling_nesting_constraints
6521:18:19 <subbu> but, in summary, i am not yet convinced that a single strategy for making templates conform is viable.
6621:18:35 <gwicke> subbu: I personally think that anything but forcing components to conform would not lead to any useful amount of modularity
6721:18:39 <cscott> i agree w/ subbu, fwiw. the context is very important when determining the validity of a fragment.
6821:18:39 <TimStarling> so the main question in this RFC is opt-in vs opt-out vs inference?
6921:19:03 <DanielK_WMDE_> gwicke: so, if a template is rendered into a dom fragment for transclusion, what additional information would be attached to that fragment? In PHP, I would expect that DOM fragment to be wrapped in a ParserOutput object, so it can pull in resource loader modules, or set page props, etc. Do you agree?
7021:19:21 <subbu> TimStarling, I think question 1 is https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#How_are_DOM_fragments_identified_during_parse.3F and question 2 is https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#Possible_approaches_for_handling_nesting_constraints
7121:19:45 <cscott> (DanielK_WMDE_: the real problem with rich arguments is they look ugly in wikitext. adding support in VE would help a lot, since people could see rich editors for the arguments, instead of being confronted with a blob of ugly json in the middle of their wikitext)
7221:20:21 <DanielK_WMDE_> cscott: i'm thinking of lua calls that return structures that are then passed on. inline JSON sucks.
7321:20:23 <gwicke> DanielK_WMDE_: the DOM fragment would satisfy certain content model constraints, and the transclusion site would require some of those to be met
7421:20:52 <gwicke> if the nested content does not meet the requirements (ex: is transcluded into a link, but contains another link), then we need to force one of the two to give in
7521:21:00 <cscott> DanielK_WMDE_: where by "lua" i'll choose to hear "javascript". ;)
7621:21:32 <subbu> I think those are 2 high level questions ... requirements of what is needed to be supported vis-a-vis parsoid, php parser, 3rd party wikis, performance ... will let us figure out which of those answers we want ... rfc 114445 proposes one set of answers to those 2 questions.
7721:21:34 <gwicke> which might mean stripping the link from the transcluded content
7821:21:36 <DanielK_WMDE_> gwicke: my point is that constraints are not enough.
7921:21:48 <DanielK_WMDE_> gwicke: anything that can be in ParserOutput can come from any transclusion.
8021:22:08 <cscott> gwicke: how are those requirements specified? it's not enough to say "the output is a fragment". you need to stay "the output is an inline fragment" or "a block fragment" or some such. how is that done?
8121:22:21 <TimStarling> I think it should be opt-in unless there is some really good argument to the contrary
8221:22:37 <gwicke> cscott: the constraints are more along the lines of "no p" or "no a", IIRC
8321:22:41 <DanielK_WMDE_> cscott: haskell
8421:22:48 <cscott> gwicke: no, they are not that simple. see my RFC.
8521:22:53 <gwicke> basically, you can compute the constraints from the parent DOM path
8621:23:06 <TimStarling> we've talked through this before, that's why we're proposing {{#balance}}
8721:23:10 <cscott> gwicke: you need to compute the constraints from the HTML5 parser state, in the general case.
8821:23:29 <gwicke> that's another way of saying it, yes
8921:23:48 <cscott> gwicke: well, i've tried that, and that way lies madness.
9021:23:53 <subbu> I prefer opt-in myself as well.
9121:23:54 <DanielK_WMDE_> TimStarling: as in {{#balance|block}}, {{#balance|inline}}, {{#balance|no}} ?
9221:24:04 <cscott> {{#balance:block}} etc yes.
9321:24:18 <TimStarling> yes
9421:24:23 <cscott> {{#balance:block}}, {{#balance:inline}}, and {{#balance:none}}
9521:24:28 <gwicke> "block" is not very specific
9621:24:42 <cscott> which is what https://gerrit.wikimedia.org/r/279670 implements
9721:24:48 <gwicke> <div>s can be nested, but <p>s cannot
9821:24:54 <cscott> gwicke: yes, exactly. it is general enough that humans can understand it.
9921:24:58 <subbu> https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#Nesting_constraints_for_transclusions_and_task_T114445 is my summary of how cscott's RFC addresses those 2 high level questions i posted above.
10021:24:58 <TimStarling> block is not literally HTML 4 block scope
10121:25:19 <DanielK_WMDE_> gwicke: you can do additional checks based on the actual dom. the declaration doesn't need to be that detailed.
10221:25:25 <cscott> "in select inside table mode in the HTML5 parser" is not usable by mortals
10321:25:37 <cscott> TimStarling: yes, it's a little confusing in that way, but i think it's what normal people would expect by "block".
10421:25:40 <TimStarling> we decided at the parsing team offsite that "block" is a good user-facing term
10521:25:42 <gwicke> DanielK_WMDE_: yes, I was saying the same- parent DOM establish the constraints
10621:26:05 <gwicke> syntax can add further constraints
10721:26:11 <TimStarling> the exact definition of what it does will be similar to HTML 4 block content model, but not exactly the same
10821:26:13 <DanielK_WMDE_> gwicke: but that's in addition to declaring the balancing mode. not instead.
10921:26:16 <cscott> ok, time check: 4 minutes-ish left on the general question.
11021:27:19 <gwicke> okay, so quick question: Is anybody seeing serious advantages in enforcing constraints in any other way than forcing the nested content to conform?
11121:27:34 <cscott> "<div>s can be nested, but <p>s cannot", is a gross simplification. Something more accurate would be "address, article, aside, blockquote, center, details, dialog, dir, div, dl, fieldset, figcaption, figure, footer, header, hgroup, main, menu, nav, ol, p, section, summary, ul, h[1-6], pre, listing, form" will terminate an open <p> tag.
11221:27:38 <DanielK_WMDE_> I actually prefer explicit opt-out for new wikis, with block level balancing as the default. For existing wikis, it would be opt-in, until all templates have a {{#balance}} declaration, at which point the default for the wiki can be changed, and it becomes opt out.
11321:27:51 <subbu> gwicke, consider <p><b>{{list-producting-template}}</b></p>
11421:27:55 <cscott> gwicke: i force contraints on the enclosing content as well.
11521:27:56 <TimStarling> gwicke, are you proposing a particular option from https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#Possible_approaches_for_handling_nesting_constraints ?
11621:28:05 <subbu> so, the only way out in that scenario using your proposal would be to convert that list to plain text.
11721:28:07 <TimStarling> if so, which letter?
11821:28:08 <gwicke> TimStarling: yes
11921:28:08 <cscott> there are context contraints and content contraints.
12021:28:08 <subbu> that won't be acceptable
12121:28:19 <gwicke> TimStarling: b
12221:28:46 <cscott> i'm proposing b+c
12321:28:56 <TimStarling> SodaAnt: b is quite different to applying the HTML 5 parsing algorithm which has been proposed to date
12421:29:02 <TimStarling> s/SodaAnt/so
12521:29:28 <gwicke> b is what we have been discussing a lot in the last ~3 years
12621:29:41 <TimStarling> the HTML 5 parsing algorithm is mostly c, right?
12721:29:51 <subbu> I think it should b+c depending on specific type annotation on the template ... in addition, i think we should leave the option open for (d)/(e) for the generic case that is too hairy.
12821:29:51 <TimStarling> well, maybe b+c like cscott says
12921:30:03 <cscott> TimStarling: no, i think you need b+c in order to allow interesting inclusions.
13021:30:08 <gwicke> subbu: I agree that the outcome is not always ideal, but if we want modularity, then it seems to be the only possible solution
13121:30:20 <cscott> you'll see that {{#balance:block}} is mostly c and {{#balance:inline}} is mostly b
13221:30:34 <TimStarling> ok
13321:30:42 <cscott> neither covers all the interesting use cases
13421:31:42 <cscott> for example, even in {{#balance:inline}} you have to do some minimal c-style munging of the context to ensure that you don't find yourself in strange parsing modes
13521:31:50 <gwicke> okay, I don't think we have enough time to discuss pros & cons of opt-in vs. opt-out vs. inference
13621:32:14 <TimStarling> what should the meeting agenda be now?
13721:32:43 <cscott> <ruby> for example -- if you want to allow <ruby> *inside* an inline template, you need to make sure it's not open outside it.
13821:32:51 <gwicke> my impression from the discussion is that there is no full agreement on the requirements yet
13921:33:04 <cscott> TimStarling: I'd like to discuss the {{#balance}} implementation more specifically.
14021:33:14 <cscott> it may be that it sheds some light on the general question.
14121:33:15 <gwicke> so I think we need to follow up on those later
14221:33:36 <TimStarling> ok, let's discuss #balance
14321:33:58 <cscott> so, i've updated https://phabricator.wikimedia.org/T114445 to match the latest implementation and spec proposal
14421:33:59 <subbu> gwicke, i thnk there is agreement on the high-level reqirements (performance, wysiwyg, etc.) ... but, how to meet those requirements via specific solutions requires more detailed requirements wrt. 3rd party wikis, parsoid, php parser, old revisions.
14521:34:23 <DanielK_WMDE_> subbu: ...<translate>...
14621:34:41 <cscott> and there's a (written but not tested) implementation in https://gerrit.wikimedia.org/r/279670 that should give a good idea of the scope of the implementation, where it fits in the parser pipeline, etc.
14721:34:51 <subbu> DanielK_WMDE_, i think <translate> as it exists needs to be migrated over to a different solution gradually.
14821:34:53 <gwicke> subbu: the strawman requirement "Transclusions do not affect surrounding content." in particular is still controversial
14921:35:05 <cscott> it's worth noting that my ideas have migrated somewhat over the past few weeks as our initial proposals encountered the realities of the HTML5 spec.
15021:35:46 <cscott> there's a full PHP HTML5 tree builder implementation in https://gerrit.wikimedia.org/r/279669, but it turned out not to be necessary for {{#balance}} in the end.
15121:36:13 <cscott> should I summarize the RFC, or give folks time to read it, or what?
15221:36:32 <DanielK_WMDE_> gwicke: it would certainly be nice if it was true.
15321:36:41 <DanielK_WMDE_> gwicke: it would be even nicer if the revers was also true.
15421:36:55 <TimStarling> cscott: what are the questions you would like answered?
15521:37:15 <cscott> i'll enumerate some.
15621:37:28 <gwicke> DanielK_WMDE_: yeah, but I think we have to pick some compromise
15721:38:04 <cscott> 1. there's a choice between a whitelist and blacklist approach for {{#balance:block}} -- in the whitelist approach the only tags allowed to remain open are <div> and <section> (that last is forward-looking). thoughts about that choice would be welcome.
15821:38:53 <cscott> 2. whether the set of tags closed by block/inline is sufficient to allow interesting stuff. ie, "but I want <a> inside my inline templates"
15921:39:20 <gwicke> cscott: why do you think it is necessary to manually specify the context constraints?
16021:39:23 <cscott> 3. whether there should be additional modes (again, maybe an inline mode which allowed <a> tags but closed them in the context.)
16121:39:50 <cscott> 4. tables are problematic. there could in theory be three separate table modes (at least) for outer scope, row, and cell context. do we need all that?
16221:40:15 <cscott> 5. whether silently stripping bad tags is good enough, or do we want a more obvious "error" mechanism.
16321:40:37 <DanielK_WMDE_> gwicke: my intuition is that it's not absolutely necessary to declare the balancing mode, but it would be useful.
16421:40:39 <cscott> and 6, stealing from gwicke: is there an alternative to manually specifying the context constraints?
16521:40:55 <cscott> ok, that's a handful of questions i'd like to hear opinions on.
16621:41:00 <gwicke> I'm rather wondering about the motivation for doing so
16721:41:01 <TimStarling> right, in 20 minutes
16821:41:02 <subbu> gwicke, DanielK_WMDE_, those type annotations can also help tools like VE figure out what to do with edits.
16921:41:09 <TimStarling> 3 minutes per question then
17021:41:26 <subbu> effectively {{#balance:*}} are type annotations on a template output
17121:41:49 <subbu> same reason types are useful in programming languages and tooling is why it can be useful here as well.
17221:41:55 <cscott> so for {{#balance:block}}, everything is permitted *inside* the template, but we ensure that all tags (other than some safe ones) are closed in the context.
17321:42:08 <gwicke> also, 7, how will this work with old revisions?
17421:42:32 <cscott> it's probably easiest to communicate to users if we said "only <div> and <section> are allowed to be open". but in fact there's a somewhat larger set of tags which are actually safe.
17521:43:06 <TimStarling> why not use a larger whitelist?
17621:43:16 <cscott> thoughts -- is it best to "start small", or "start permissive".
17721:43:23 <gwicke> subbu: the question seems to be closer to "should we do type inference, or make the user write them out manually"
17821:43:33 <cscott> TimStarling: I looked at all the rest of the tags which could be whitelisted, and i really didn't see any which I expected to be used in wikitext.
17921:43:40 <TimStarling> right
18021:44:01 <subbu> gwicke, right, but explicit annotations either way ... so that tools can access that type and so that every tool doesn't have to do type inference on template output.
18121:44:06 <TimStarling> the list could be expanded later if new tags are added to Sanitizer's whitelist, right?
18221:44:11 <gwicke> as cscott mentioned, it seems that the manual types aren't complete yet
18321:44:31 <cscott> TimStarling: part of this question is, if anyone can think of one or two tags which are useful, then we could add them and still have a small whitelist. is a small whitelist worth having?
18421:44:33 <gwicke> they don't capture some of the information that's already available in the DOM
18521:45:29 <TimStarling> let's concentrate on addressing gwicke's and DanielK_WMDE_'s concerns
18621:45:44 <cscott> gwicke is discussing the more general issue, but i think "ease of communicating the operation and ideas to humans" is an important design point, which is why I'm suggesting a small whitelist for {{#balance:block}}
18721:46:12 <cscott> and why I don't think a hyper-specific inference engine is a good idea. humans are going to get confused by the plethora of possible insertion modes theoretically possible.
18821:46:45 <DanielK_WMDE_> cscott: +1
18921:46:49 <cscott> to be concrete, look at http://w3c.github.io/html/ in the table of contents, under "the rules for parsing tokens in HTML content"
19021:46:59 <gwicke> for the most part, fix-ups in browsers are "just working" from a user perspective
19121:47:00 <subbu> either we go with strategy (b) everywhere OR we pick a very clearly communicable set of type annotations as in cscott's RFC 114445.
19221:47:03 <cscott> there are 23 different insertion modes
19321:47:34 <gwicke> the percentage of web users who are aware of the adoption agency algorithm is very small, and that's fine
19421:47:55 <subbu> gwicke, but, by that same token .. browsers don't guarantee that surronding context won't be affected.
19521:47:55 <DanielK_WMDE_> I still want to know how the DOM fragments will be represented internally, and how additional info like sitelinks etc that where generated while parsing will be passed back to the caller
19621:48:05 <DanielK_WMDE_> shall we have a ParserOutput object for every template transclusion?
19721:48:05 <subbu> adoption agency algorithm is an example of that.
19821:48:10 <cscott> DanielK_WMDE_: perhaps you want to look at my gerrit patchset?
19921:48:16 <gwicke> so I think that's a data point showing that fix-ups in a "close to expected" way are possible to implement transparently
20021:48:36 <DanielK_WMDE_> cscott: perhaps i do :)
20121:48:50 <cscott> DanielK_WMDE_: https://gerrit.wikimedia.org/r/#/c/279670/4/includes/parser/Preprocessor.php
20221:49:10 <TimStarling> DanielK_WMDE_: in MW, we just want to make the output roughly the same as parsoid, we're not actually targeting incremental parsing or whatever
20321:49:12 <subbu> can we take a step back to figure out what the contentious question is that needs resolution?
20421:49:25 <TimStarling> so there's no need for separate ParserOutput there
20521:49:45 <subbu> https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#Implementing_incremental_parsing_in_Parsoid lists what is needed for implementing incremental parsing ..
20621:49:46 <gwicke> subbu: to me, it is a) how modular do we want transclusions to be?
20721:49:47 <TimStarling> in parsoid, there is indeed already a separate ParserOutput for each top-level template
20821:49:56 <cscott> In my implementation I just do a single pass over the output, after all the transclusions have been applied. "Just before tidy."
20921:49:59 <subbu> and that shows it is hard to do incremental parsing in php parser as it exists today.
21021:50:04 <TimStarling> and e.g. categories generated by templates are converted by parsoid to localised meta tags
21121:50:09 <cscott> s/implementation/PHP implementation/
21221:50:36 <gwicke> b) should we go with the proposed opt-in direction, and does it get us the benefits we are hoping for?
21321:50:40 <cscott> but i've specified the semantics such that the output is identical to the output you'd obtain by processing the template and context in isolation.
21421:50:49 <Krinkle> cscott: https://html.spec.whatwg.org/multipage/syntax.html#the-rules-for-parsing-tokens-in-html-content
21521:51:32 <gwicke> and c) if we go with opt-in, should we use constraints from the context, or make the user specify the intention manually?
21621:51:35 <Krinkle> https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inhtml *
21721:52:17 <cscott> generally speaking we want to stay in the `"in body" insertion mode`, and the constraints I've specified for {{#balance}} ensure that we start there before the template and end there after the template.
21821:52:22 <DanielK_WMDE_> cscott: your proposal still follows the "transclude first, then parse" approach. To allow non-wikitext content to be transcluded, I'd like to get away from that.
21921:52:27 <subbu> cscott, TimStarling DanielK_WMDE_ do gwicke's 3 questions look like the questions that need resolution?
22021:52:56 <cscott> DanielK_WMDE_: my PHP implementation does. But like I said, the semantics are written so that you get identical output if you balance first and then transclude.
22121:53:03 <cscott> that's the whole point, actually.
22221:53:21 <DanielK_WMDE_> cscott: i agree that it's a good first step
22321:53:46 <cscott> if you apply these rules to your content and your inclusion site, then you can use any number of different ways to process the components and still be guaranteed that nothing bad will happen when you combine them together.
22421:54:06 <TimStarling> subbu: well, these are questions that already have firm answers in the existing proposal
22521:54:30 <cscott> the parsoid implementation will probably do the "balance first, then transclude" ordering, because "there is indeed already a separate ParserOutput for each top-level template"
22621:54:34 <TimStarling> I think if gwicke wants to block the existing implementation then the onus is on him to explain why it is a bad idea
22721:55:35 <gwicke> TimStarling: don't mistake my questions for a desire to block you; I would just like to see them answered, and the current RFC does not set out why the options it chose are actually the best ones / satisfy the most reasonable set of requirements
22821:55:46 <TimStarling> fair comment
22921:55:57 <subbu> gwicke, what concerns you with the opt-in direction wrt benefits?
23021:56:23 <gwicke> one concern is old revisions
23121:56:31 <gwicke> and coverage
23221:56:36 <subbu> cscott, old revisions?
23321:56:41 <subbu> can you address that?
23421:56:49 <subbu> or point to the rfc section that addresses it.
23521:56:52 <cscott> opt-in was chosen specifically to remove obstacles from the critical path and start getting content converted. it doesn't preclude opt-out-ish stuff -- you could still use an automatic inference tool to find places where balance is safe, and then automatically opt them in.
23621:57:07 <cscott> i don't understand what the question is with old revisions?
23721:57:25 <TimStarling> there's no proposal to change the handling of old revisions
23821:57:34 <cscott> that is, you can use an automatic inference tool if you have one. it just removes that tool from the critical path.
23921:58:05 <gwicke> so, basically the performance and composition benefits would only happen once 100% of transclusions in an article revision are explicitly marked for balancing
24021:58:20 <subbu> not true at least for performance.
24121:58:28 <cscott> no, you can do fast substitutions of any top-level balanced transclusion.
24221:58:44 <cscott> the infobox template alone could account for millions of fast tranclusions
24321:58:47 <subbu> right.
24421:58:58 <gwicke> subbu: any preceding unbalanced template can still affect the remainder of the content
24521:59:13 <TimStarling> only if it changes
24621:59:19 <cscott> the context requirements mean that you can still do a straight subst of the transclusion
24721:59:26 <subbu> right. what cscott and TimStarling said.
24821:59:26 <Scott_WUaS> Thank you, Gabriel, Tim, Subbu, CScott, DanielK_WMDE and all!
24921:59:29 <gwicke> TimStarling: right, but that's a given
25021:59:30 <cscott> you have to reparse the top-level page, but *not* the template.
25121:59:55 <cscott> that's (one of) the benefit(s) of subbu's option (c)
25222:00:01 <TimStarling> any action items to wrap up?
25322:00:06 <gwicke> that's already the case right now, isn't it?
25422:00:13 <gwicke> we can already reuse template content
25522:00:18 <subbu> #link https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability
25622:00:36 <subbu> gwicke, no .. see https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Document_Composability#Situation_today_in_core_parser_and_Parsoid about ad-hoc-ness
25722:00:47 <robla> program note: we tentatively scheduled next week's IRC topic: https://phabricator.wikimedia.org/E162
25822:00:53 <cscott> no, templates can foster stuff arbitrarily far into the surrounding page. as just one example.
25922:01:06 <gwicke> that's detected, though
26022:01:08 <robla> next week: https://phabricator.wikimedia.org/T91162
26122:01:10 <subbu> TimStarling, I think we need to prepare an outline of contentius questoins and answers or if unresolved what is needed o resolve them.
26222:01:16 * DanielK_WMDE_ wants to hear more about parameter handling
26322:01:37 <cscott> i have to run promptly-ish today to pick up my kids (we're going to a red sox game tonight)
26422:01:46 <cscott> but i can follow up on the phab ticket w/ you if you like.
26522:01:50 <TimStarling> I suggest having any followup discussion on #mediawiki-parsoid
26622:02:06 <cscott> i think an echo-like template would probably handle any use cases involving parameter handling.
26722:02:06 <TimStarling> and on the phab ticket if you want cscott to talk to you :)
26822:02:08 <Alsee> I would like to ask a more basic question. I understand why balanced templates make things easier and better for the machine side, but is there any benefit for the human side? It seems to be trading off complexity on the human side to make it easier for the machine.
26922:02:22 <TimStarling> Alsee: the meeting time is over now
27022:02:27 <subbu> Alsee, no complexity on the normal editor side.
27122:02:28 <cscott> Alsee: no more forgetting a close tag and turning your entire article bold face
27222:02:45 * YairRand is mildly confused as to why the syntax is {{#balance:block}} (a parser function) instead of a behavior switch (__BALANCEBLOCK__), which seems to be more used for this kind of thing. (is behavior switch syntax deprecated?)
27322:02:50 <cscott> Alsee: in general broken templates shouldn't break the rest of your article, or prevent you from editing the rest of the article
27422:02:53 <TimStarling> #endmeeting

daniel renamed this event from RFC Meeting: Hygienic transclusions and balanced templates (2016-04-13, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
daniel renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: Hygienic transclusions and balanced templates (2016-04-13, #wikimedia-office).