Page MenuHomePhabricator

Rendering diff from Parsoid not showing the "geodata-multiple-primary" error (visual diff testing)
Open, Needs TriagePublic

Description

https://cs.wikivoyage.org/wiki/Auschwitz-Birkenau?useparsoid=0
https://cs.wikivoyage.org/wiki/Auschwitz-Birkenau?useparsoid=1

Screenshot 2024-08-15 at 6.47.07 PM.png (210×1 px, 72 KB)

Screenshot 2024-08-15 at 6.47.22 PM.png (120×1 px, 49 KB)

From,

{{Geo|50.035833|19.178333}}
{{Coord|50.035833|19.178333}}
{{Coord|50|2|8.999|N|19|10|41.999|W|display=title}}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Notes after investigation:

  • GeoData has a "getFromParserOutput" method that does a getExtensionData on a ParserOutput, relying on state and on the fact that we're keeping the same ParserOutput in the parse (with Parsoid we don't).
  • This has an impact on error display ("more than one primary coordinates" and "too many coordinates defined") and, in case of error, on the wgCoordinates javascript variable, which gets updated in Parsoid but not on legacy.
  • It also has an impact on multiple "secondary" coordinates: only one of them is added to a Parsoid ParserOutput. I do not know where these are used for a read-view parse.
  • GeoData DOES have database storage. that database storage is modified on the LinksUpdateComplete hook. As far as I can tell, that hook relies on a legacy parse, it explicitly passes no option on said parse to get a "canonical" output. I do not know if this is on purpose or if this will change at any point. For the time being, it means that the database storage is NOT modified by the use of Parsoid read views.
  • Same remark for cirrussearch indexing, there's a hook in there to update the index, but as far as I can tell this also relies on a legacy parse. same remark, same consequences.

The "multiple secondary coordinates" issue could probably be handled inline by tweaking the ParserOutput extensiondata handling. I have more difficulty seeing how to handle the errors that trigger around "counting/keeping state" during the regular parse; I suppose that moving these to a post-processing pass would be the play here. That said, if a post-processing pass is used for error handling anyway, it may be worth handling the multiple secondary coordinates in post as well, so that the behaviour can align with a legacy "single parse" behaviour.

Gehel subscribed.

This does not seem to require direct attention from the Search Platform team. Ping us if I'm misunderstanding the context.

ihurbain subscribed.