Page MenuHomePhabricator

Parsoid DISPLAYTITLE handling is broken/gone
Closed, ResolvedPublic

Description

Parsoid's PrepareDOM pass has commented-out handling for DISPLAYTITLE:

// Set title to display when present (last one wins).
if ( DOMCompat::nodeName( $node ) === 'meta'
	&& $node->getAttribute( 'property' ) === 'mw:PageProp/displaytitle'
) {
	// PORT-FIXME: Meh
	// $env->getPageConfig()->meta->displayTitle = $node->getAttribute( 'content' ) ?? '';
}

I'm removing this code and the whole PrepareDOM pass, so some other place will have to be found for this, if we want it to start working again.

In the old parser, DISPLAYTITLE is one of a large number of secondary data items that are returned in ParserOutput. I think it would be most efficient to store these kinds of page properties in the DataBag as the meta tags are generated. In a hypothetical future in which template HTML fragments are persistently cached, it may be necessary to scan the fragments for meta tags. But for now, it doesn't make sense to me to generate a meta tag, forget about it, and then search the whole DOM again looking for it.

Event Timeline

Arlolra triaged this task as Medium priority.Nov 1 2021, 5:52 PM
Arlolra moved this task from Needs Triage to Missing Functionality on the Parsoid board.

__DISPLAYTITLE__ is "just" a magic variable, so the implementation should be moved to CoreMagicVariables and then it will Just Work when Parsoid calls that same implementation.

I don't feel strongly that this needs to be handled in standalone mode, aka not necessary to explicitly include it in the page bundle. T327173#8641167 has a somewhat similar discussion; in that case whether cache lifetime should be in the page bundle -- I think cache lifetime has a better case for being present in REST API responses than displaytitle does.

A somewhat orthogonal view, is that ContentMetadataCollector is replacing pagebundle as the canonical way to encode "everything other than HTML" in the output of the parser. To the extent that we want to also include this metadata in (for example) REST or Parse API replies, we should perhaps be thinking about how to scalably/uniformly embed ParserOutput metadata in those replies, rather than about exposing it directly. IE let's first put DISPLAYTITLE into the ParserOutput (ideally using a refactored version of the code from the legacy parser), *then* export all of the ParserOutput in the page bundle.

Change 908936 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/core@master] WIP: ParserOutput: Ensure page title is updated after merging properties

https://gerrit.wikimedia.org/r/908936

Change 908937 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] WIP: Remove special support for DISPLAYTITLE

https://gerrit.wikimedia.org/r/908937

Change 908936 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput: Ensure page title is updated after merging properties

https://gerrit.wikimedia.org/r/908936

Change 908937 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Remove special handling for DISPLAYTITLE & DEFAULTSORT

https://gerrit.wikimedia.org/r/908937

Change 919886 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a10

https://gerrit.wikimedia.org/r/919886

Change 919886 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a10

https://gerrit.wikimedia.org/r/919886