Page MenuHomePhabricator

Embeded wikitext infobox handling in Parsoid data-mw
Closed, ResolvedPublic

Description

Hi Parsoid developers,

It seems Parsoid data-mw doesn't separate embeded infobox with the previous attribute. For example:
Page http://en.wikipedia.org/wiki/New_York_Botanical_Garden has wikitext:

{{Infobox Museum
...
 | website       = {{URL|http://www.nybg.org/}}
{{Infobox NRHP  // we're expecting this infobox not mixing with the previous "website" attribute
 | embed = yes
 | nrhp_type = nhl
...
}}
}}

The data-mw combines the embeded infobox into value of "website", which is unexpected:

"website":{"wt":"{{URL|http://www.nybg.org/}}\n{{Infobox NRHP\n | embed = yes\n | nrhp_type = nhl\n | ...}}"},

Can you separate the embeded infobox so that it doesn't pollute the "website" attribute? We noticed data-mw can give expected results for embeded infobox who have a specific key, e.g.
Page http://en.wikipedia.org/wiki/David_Beckham has wikitext

{{Infobox person
...
| website          = [http://www.davidbeckham.com davidbeckham.com]
| module           =
{{Infobox football biography  // this infobox is considered as value of "module"
| embed            = yes
| position         = [[Midfielder]]
...
}}
}}

and we have data-mw:

"module":{"wt":"{{Infobox football biography\n| embed            = yes\n|...}}"}

Is it possible to automatically add a "module" attribute for the embeded infobox in the previous example?

Thanks,

Event Timeline

Renxiaoyi created this task.Aug 1 2016, 6:31 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 1 2016, 6:31 AM

Unfortunately, this is correct behavior. Consider the transclusion below:

{{1x|foo
bar
baz
}}

Here, "foo\nbar\nbaz" is part of the argument passed into the template. That is expected parsing behavior for transclusions. The infobox transclusion above is similar, so Parsoid's interpretation is correct.

The reason the output renders correctly is because the nested transclusion renders a bunch of table rows which implicitly closes the table cell and table row for the website content. So, yes, the template and the transclusion is treating the input and ouput as a bunch of strings that are processed to generated the final HTML string for the infobox. So, there isn't much Parsoid can do at this point. The associated template and uses would need to be fixed up to NOT rely on this kind of behavior and be aware that these infoboxes cannot be edited structurally in HTML editors.

Anyway, we are making slow progress towards more structured model for templates which would clean up this kind of wikitext in the long run. In the short run, sorry we cannot help you more here.

ssastry closed this task as Resolved.Aug 15 2016, 10:18 PM
ssastry claimed this task.

I agree that newline "\n" is not a good signal to separate parameters, however for this case we have a nested infobox with attribute "embed=yes", which may be a strong signal to make a segmentation.

It's not an urgent feature request. Looking forward to the more structured output of Parsoid.

I agree that newline "\n" is not a good signal to separate parameters, however for this case we have a nested infobox with attribute "embed=yes", which may be a strong signal to make a segmentation.

https://en.wikipedia.org/wiki/Template:Infobox_NRHP#Embedding says:

In order to embed this infobox, first look at the pre-existing infobox; it should provide a "module" or an "embedded" parameter, inside of which the NRHP should be placed. If the pre-existing infobox does not have the capacity for embedding, do not insert the NRHP infobox in any other field, but request that a parameter for embedding be added on the talk page of the other infobox.

So, the way this embedding is being done is not the recommended way. Hopefully, getting a fix to Infobox Museum might solve this problem in the interim.