Page MenuHomePhabricator

Unbalanced annotation tags in template arguments
Open, Needs TriagePublic

Description

Given an interweaving like {{1x|<translate>123}}</translate>, the legacy parser has the full view of all the content and will match the start and end tags, and strip them.

When Parsoid asks for the template expansion, the legacy preprocessor will only see the opening tag {{1x|<translate>123}} and leaves it unstripped.

The result is a difference in parser output.

This came up today while examining roundtrip testing results. An example of parse differences can be seen at,
https://meta.wikimedia.org/w/index.php?title=Campaigns/Foundation_Programs_Team&oldid=25730306&useparsoid=1#V0_Event_Registration_Tool:_Demo_and_Invitation_to_Test

That example is slightly more complex because the <tvar name=date1> that isn't stripped has an unescaped = which combines with the template syntax to form a template named argument that leaves the unnamed first argument empty,

{{indent| <translate><!--T:1177--> '''Session 1:'''  <tvar name=date1> {{DateT|2022|7|21}}, 17:00 UTC}} </tvar> </translate>

Relying on the legacy stripping comes from T295834. There are some open questions about doing the stripping in Parsoid in T301492 and T301490. Either way, should unmatched tags be left around?