Page MenuHomePhabricator

Figure out how Parsoid will work with MCR slots to support both reading and editing clients
Open, Needs TriagePublic

Description

While addressing T332931: Read views HTML and Edit views HTML for a page should come from the same ParserCache entry, we broke visual editing for pages with multiple MCR slots (see T351026: VisualEditor adding nonsense code to file pages ).

To address T351026, we are currently deploying a hack where we always return the 'main' slot output for Parsoid HTML requests. This will however now break Parsoid read view for these pages since we will only be returning the main slot output.

We should figure out a strategy (internal and external API fixes) that lets editing and reading clients get the appropriate output for pages that have multiple MCR slots filled up.

Event Timeline

Change 973823 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/core@master] Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set

https://gerrit.wikimedia.org/r/973823

Change 973797 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/core@wmf/1.42.0-wmf.4] Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set

https://gerrit.wikimedia.org/r/973797

Change 973823 merged by jenkins-bot:

[mediawiki/core@master] Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set

https://gerrit.wikimedia.org/r/973823

Change 973797 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.4] Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set

https://gerrit.wikimedia.org/r/973797

Mentioned in SAL (#wikimedia-operations) [2023-11-13T21:35:59Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:969401|mobile: Add MobileUrlCallback (T257852)]], [[gerrit:973797|Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113)]]

Mentioned in SAL (#wikimedia-operations) [2023-11-13T21:37:17Z] <urbanecm@deploy2002> urbanecm and ssastry and tgr: Backport for [[gerrit:969401|mobile: Add MobileUrlCallback (T257852)]], [[gerrit:973797|Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-11-13T21:54:33Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:969401|mobile: Add MobileUrlCallback (T257852)]], [[gerrit:973797|Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113)]] (duration: 18m 34s)

Cross-linking for visibility: the current hack means that SDC data (from the mediainfo slot) is not visible on Parsoid page views – see T364228. (I uploaded a proposed config change to disable ParserMigration on Commons, so that users aren’t confused by the invisible data and potentially add duplicate data.)

For now, one solution here is for us to figure out a HTML representation that lets us extract HTML for a specific slot from the combined slot output that is stored in ParserOutput. Alternatively, we would have to store slot output separately in ParserOutput in a json array -- but that leads to encoding bloat and encoding/decoding overheads. But, in either case, once we agree on the representation for this in ParserOutput, when VE asks to edit the page, we can extract the main slot output here. Ideally, VE would also ask for the slot it wants to edit in which case, we wouldn't have to hardcode this either.

That said, ParserOutput already has a json object, so maybe I am worried about nothing here.

I like the idea of storing one ParserOutput object per slot, and combinding then on the fly.

See T227776: General ParserCache service class for large "current" page-derived data for some earlier thoughts around this topic.

The logic for combining slot html in ContentRenderer (I think) is probably pretty broken from an editing standpoint. Probably the composition should be moved to OutputTransform and the separate slot content should be stored separately in ParserOutput, which is more or less what @subbu and @daniel is proposing -- except @subbu is saying one ParserOutput contains many slots and @daniel is saying ParserCache should be "per slot" and the combining ParserOutputs into a final page should be post-cache. I think!

Anyway, the alternative would be to compose the HTML for the slots in a reversible way so you could strip out all but the main slot for editing...but I think that works around the original goal for MCR which would be to allow arbitrary composition of slots, not restrict the manner in which they are allowed to be composed.