Page MenuHomePhabricator

inline data-parsoid found in indicator HTML
Open, MediumPublic

Description

Check output of https://en.wikipedia.org/wiki/Hampi_(town)?useparsoid=1 for example and we see inline data-parsoid there. It is not being stripped when pagebundle format is requested.

Event Timeline

ssastry triaged this task as Medium priority.Jul 21 2023, 7:54 PM
ssastry added a project: Essential-Work.

As it turns out, this is a known issue. See this FIXME which I added in an earlier patch. To deal with this automatically, I had https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/932037 but @Arlolra found that objectionable.

So, we will either need to resolve that argument OR we need to write custom traversal code for saveDataParsoid

This would be fixed by the rich attributes patch: T339927: Rich Attribute Support in Parsoid. By keeping the document fragment in the attribute live, the data-parsoid for the embedded content is handled the same way all the other data-parsoid is handled.

This also applies to LanguageConverter markup, which has similar inline data-parsoid. It might apply to gallery captions as well, although Arlo might have independently hacked around that.