Page MenuHomePhabricator

Template generated data-mw doesn't contain href
Closed, ResolvedPublic

Description

The Parsoid specs gives an example of template to data-mw translation:
template: {{foo|unused value|paramname=used value}}
data-mw: {"parts": [{"template":{"target":{"wt":"foo","href":"./Template:Foo"},"params":{"1":{"wt":"unused value"},"paramname":{"wt":"used value"}},"i":0}}]}
So I'm expecting every template generated data-mw has a "href" attribute in "target" section.

This is true for most cases, but I observed exceptions recently, for example:

  • en/IVL_K.1_Kurki: data-mw='{"parts":[{"template":{"target":{"wt":"Infobox settlement\n<!--See the Table at Infobox Settlement for all fields and descriptions of usage-->\n\n"},"params":...
  • ru/(7079)_Багдад: data-mw='{"parts":[{"template":{"target":{"wt":"Малая планета\n<!-- Основной блок -->\n "},"params":...

Seems it's a bug in wikitext handling.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 22 2016, 8:09 PM
Arlolra triaged this task as Normal priority.Apr 22 2016, 8:47 PM
Arlolra added a subscriber: Arlolra.

Here's a simplified test case,

{{echo
<!-- test -->
 | hi
}}

Oddly, the space before | hi makes a difference.

[{"type":"SelfclosingTagTk","name":"template","attribs":[{"k":["echo",{"type":"NlTk","dataAttribs":{"tsr":[6,7]}},{"type":"SelfclosingTagTk","name":"meta","attribs":[{"k":"typeof","v":"mw:EmptyLine"}],"dataAttribs":{"tokens":[{"type":"COMMENT","value":" test ","dataAttribs":{"tsr":[7,20]}},"\n"],"tsr":[7,21]}}," "],"v":"","srcOffsets":[2,22]},{"k":"","v":[" hi",{"type":"NlTk","dataAttribs":{"tsr":[26,27]}}],"srcOffsets":[23,23,23,27]}],"dataAttribs":{"tsr":[0,29],"src":"{{echo\n<!-- test -->\n | hi\n}}"}}]

vs

[{"type":"SelfclosingTagTk","name":"template","attribs":[{"k":["echo",{"type":"NlTk","dataAttribs":{"tsr":[6,7]}},{"type":"COMMENT","value":" test ","dataAttribs":{"tsr":[7,20]}},{"type":"NlTk","dataAttribs":{"tsr":[20,21]}}],"v":"","srcOffsets":[2,21]},{"k":"","v":[" hi",{"type":"NlTk","dataAttribs":{"tsr":[25,26]}}],"srcOffsets":[22,22,22,26]}],"dataAttribs":{"tsr":[0,28],"src":"{{echo\n<!-- test -->\n| hi\n}}"}}]

Change 281037 had a related patch set uploaded (by Subramanya Sastry):
WIP: Use mediawiki-title package to replace homegrown Title code

https://gerrit.wikimedia.org/r/281037

ssastry moved this task from Backlog to In Progress on the Parsoid board.May 25 2016, 7:46 PM
cscott added a subscriber: cscott.Jun 30 2016, 6:55 PM

Oddly, the space before | hi makes a difference.

Probably because we weren't tokenizing this correctly; see
https://gerrit.wikimedia.org/r/#/c/295741/2/lib/wt2html/pegTokenizer.pegjs.txt

Probably because we weren't tokenizing this correctly;

No, your patch still produces the EmptyLine,

[{"type":"SelfclosingTagTk","name":"template","attribs":[{"k":["1x",{"type":"NlTk","dataAttribs":{"tsr":[4,5]}},{"type":"SelfclosingTagTk","name":"meta","attribs":[{"k":"typeof","v":"mw:EmptyLine"}],"dataAttribs":{"tokens":[{"type":"COMMENT","value":" test ","dataAttribs":{"tsr":[5,18]}},"\n"],"tsr":[5,19]}}," "],"v":"","srcOffsets":[2,20,20,20]},{"k":"","v":[" hi",{"type":"NlTk","dataAttribs":{"tsr":[24,25]}}],"srcOffsets":[21,21,21,25]}],"dataAttribs":{"tsr":[0,27],"src":"{{1x\n<!-- test -->\n | hi\n}}"}}]

The mediawiki-title patch takes care of this, btw.

Change 281037 merged by jenkins-bot:
Use mediawiki-title package to replace homegrown Title code

https://gerrit.wikimedia.org/r/281037

ssastry closed this task as Resolved.Jul 23 2016, 5:05 AM
ssastry claimed this task.

The fix has not yet been deployed to production, but probably will happen within the next 2 weeks.