Page MenuHomePhabricator

Parsoid HTML: Vertical list rendered as a collapsed hlist on this enwikivoyage page
Closed, ResolvedPublic

Description

See https://en.wikivoyage.org/wiki/Tbilisi?useparsoid=1#By_bus_or_marshrutka and click the 'Expand' links on the right under the map.

Compare the same on the non-parsoid version: https://en.wikivoyage.org/wiki/Tbilisi?useparsoid=0#By_bus_or_marshrutka

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The wikitext is:

{{legend|#ff15f3|Zone 1: Western Georgia and Rustavi}}
<div class="mw-collapsible mw-collapsed">
* '''Akhaltsikhe-Borjomi'''
* '''Batumi'''
* Kaspi-Metekhi-Akhalkalaki
* Khashuri/Surami
* Khashuri-Zestaponi-'''Kutaisi'''
* Rustavi
* Tsilkani/Ereda
* Ureki-Kobuleti-Sarpi
* '''Zugdidi'''
</div>
{{legend|#ffff00|Zone 2: [[Mtskheta-Mtianeti]] region}}

So I'm not sure why we're not parsing the * as a list; maybe we're not in indent-pre mode (why?) or maybe there's an error earlier on the page which is throwing off the tokenizer?

Arlolra renamed this task from Parsoid HTML: Possible CSS issue - vertical list rendered as a collapsed hlist on this newikkivoyage page to Parsoid HTML: Possible CSS issue - vertical list rendered as a collapsed hlist on this enwikkivoyage page.Mar 12 2024, 8:36 PM

I think the wikitext you're looking for is:

{{mapframe|41.7506103|44.7786187|zoom=17|align=right|show=didube|name=Didube bus station:
* Khashuri/Surami
}}

which is reproducibly different at https://en.wikivoyage.org/wiki/User:Arlolra/sandbox?useparsoid=1

That expands to,

<mapframe align="right" height="420" image="" latitude="41.7506103" longitude="44.7786187" show="didube" text="&lt;div class=&quot;magnify&quot; title=&quot;Enlarge map&quot;&gt;&lt;maplink class=&quot;no-icon&quot; image=&quot;&quot; latitude=&quot;41.7506103&quot; longitude=&quot;44.7786187&quot; show=&quot;mask,around,buy,city,do,event,drink,eat,go,listing,other,see,sleep,vicinity,view,black,blue,brown,chocolate,forestgreen,gold,gray,grey,lime,magenta,maroon,mediumaquamarine,navy,red,royalblue,orange,silver,steelblue,teal,fuchsia,route1,route2,route3,route4,route5&quot; title=&quot;&quot; url=&quot;&quot; zoom=&quot;18&quot;&gt;&lt;/maplink&gt;&lt;/div&gt;Didube bus station:
* Khashuri/Surami" title="" url="" width="420" zoom="17"></mapframe>

The newline in the attribute value is the difference. Parsoid normalizes it, as the legacy parser would if it encountered the literal tag,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Wt2Html/TT/ExtensionHandler.php#L45

However, the legacy parser gets a direct call from lua to the parser function, so the attributes aren't sanitized the same, presumably,
https://en.wikivoyage.org/wiki/Module:Map

This is annoying in that sanitization / normalization behavior shouldn't depend on the particular code path that is taken. So, Parsoid's behavior is correct here. On that consistency note, https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/981649 is something we should pick up in Parsoid and get in.

I am tempted to say this is a wont-fix but do we have an alternative for editors that we can offer to let them get the rendering they are desiring here?

An alternative would be to not sanitize newlines in attribute values anywhere, but not sure what kind of breakage that would cause with extension uses in the wild -- so probably a no go for now.

I updated Arlo's test page to show difference in rendering even with legacy parser.

ssastry renamed this task from Parsoid HTML: Possible CSS issue - vertical list rendered as a collapsed hlist on this enwikkivoyage page to Parsoid HTML: Possible CSS issue - vertical list rendered as a collapsed hlist on this enwikivoyage page.Mon, Apr 8, 4:51 PM
ssastry renamed this task from Parsoid HTML: Possible CSS issue - vertical list rendered as a collapsed hlist on this enwikivoyage page to Parsoid HTML: Vertical list rendered as a collapsed hlist on this enwikivoyage page.

An option that *might* work (beware, it's kind of ugly) is to make $extApi->frame->getSrcText available to extensions. Equipped with this and the token offsets (which are already available through $extArgs), they could conceptually get back to the source and do whatever they want there.
Alternatively, and possibly less ugly: in the ExtensionHandler, when doing the normalization in normalizeExtOptions, save both the raw value and the normalized version. This would avoid breakage of existing extensions while giving the opportunity to extensions to go back to the raw value if they really really want to (and then have Kartographer use that to be able to parse the list).
(Now that I think about it, I think I have a preference for the second one - it's a more "public/clear" endpoint, but at the same time it's probably less ripe for abuse).

Relatedly, investigating this makes me think that we may miss corner cases on T362034 considering the amount of comments in that normalization vicinity that go "beware, srcOffsets dragons" 🤔

Hah, it turns out we actually ALREADY have the raw value in there :D this MIGHT be an easy fix. MIGHT.

Change #1020289 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/extensions/Kartographer@master] Interpret tag arguments raw values

https://gerrit.wikimedia.org/r/1020289

Change #1020289 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Interpret tag arguments raw values

https://gerrit.wikimedia.org/r/1020289