Page MenuHomePhabricator

Ensure <meta typeof="..."> in Parser/Parsoid HTML can't be spoofed from wikitext
Open, MediumPublic

Description

The core sanitizer generally allows <meta> tags in wikitext as long as they have content and itemprop attributes. That means that the following might make it intact from wikitext into HTML, and then confuse the Parsoid html2wt code:

<meta typeof="mw:Annotation/translate" itemprop="foo" content="bar">
<meta property="mw:PageProp/toc" itemprop="foo" content="bar">

Parsoid contains code in the tokenizer to remap typeof attributes, but it's not clear that the new Annotation code uses those pathways. The page property metas don't use typeof at all, which impacts ToC spoofing. And of course, none of the Parsoid remapping, done in Parsoid's copy of the Sanitizer, is done in core's copy of the Sanitizer (yet): T248211: One Sanitizer to Rule Them All/T247804: Move Sanitizer from core into Parsoid

The MediaWiki DOM Spec briefly mentions "User-supplied RDFa with the mw prefix is moved to a non-clashing prefix in Parsoid." but I don't think we document anywhere (except in code) exactly how that mapping is done.

So this task is to:

  • decide on a uniform attribute sanitization/remapping process to ensure that Parsoid <meta> tags aren't spoofable from wikitext content, while allowing wikitext content maximum flexibility for authoring non-conflicting <meta> tags (see T48826: Sanitizer breaks microdata).
  • implement it both in the core Sanitizer and Parsoid (or in the T248211: One Sanitizer to Rule Them All)
  • document this remapping clearly in the MediaWiki DOM Spec
  • add test cases to show that spoofing isn't possible and protect against future regressions

See comments on https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/702996 for some places which likely need attention.

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenBUG REPORTNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
Resolvedssastry
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedcscott
ResolvedABreault-WMF
Resolvedcscott
Opencscott
Resolvedssastry
ResolvedJgiannelos
OpenJgiannelos
OpenJgiannelos
OpenJgiannelos
OpenJgiannelos
ResolvedJgiannelos
Resolveddaniel
Resolvedcscott
OpenNone
Resolvedovasileva
Resolvedssastry
OpenNone
Resolvedcscott
Resolvedmatmarex
OpenNone
OpenBUG REPORTNone
OpenNone
OpenNone

Event Timeline

add test cases to show that spoofing isn't possible and protect against future regressions

The parserTest "Strip reserved data attributes" exists, as a start

Arlolra triaged this task as Medium priority.Nov 5 2021, 7:01 PM
Arlolra moved this task from Needs Triage to Tech Debt / Big changes on the Parsoid board.

@cscott: Removing task assignee as this open task has been assigned for more than two years - see the email sent to all task assignees on 2024-04-15.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!