Page MenuHomePhabricator

Ensure ParserOutput can always be combined asynchronously/out-of-order
Open, MediumPublic

Description

Right now a number of extensions treat the ParserOutput as a read-write store, reading a previous value from the ParserOutput, updating it, and then storing the updated value back. This is incompatible with the separate/asynchronous/incremental parsing model on the Parsoid roadmap, the "write-only" model of the Parsoid ContentMetadataCollector interface, and the proposed asynchronous-parsing aspect of the WikiFunctions feature.

Methods which assume a destructive update should be replaced with append and set variants which are guaranteed to be write-only. (The "set" will throw an exception if a different value is stored for a key, although use as a flag where multiple places set the same value is permitted; "append" collects all values in a set.)

Any post-processing (to select the minimum value set, the last value set in wikitext order, etc) should be done when the collected results are transferred to the OutputPage, not done in the ParserOutput itself (ie, in the OutputPageParserOutput hook).

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Opencscott
OpenNone
Opencscott
Opencscott
Opencscott
Opencscott
Opencscott

Event Timeline

Change 759837 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Use uniform representation for ParserOutput \"index policy\"

https://gerrit.wikimedia.org/r/759837

Change 762008 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Provide method to merge a ParserOutput into a ContentMetadataCollector

https://gerrit.wikimedia.org/r/762008

Change 762008 merged by jenkins-bot:

[mediawiki/core@master] Provide method to merge a ParserOutput into a ContentMetadataCollector

https://gerrit.wikimedia.org/r/762008

Change 759837 merged by jenkins-bot:

[mediawiki/core@master] Use uniform representation for ParserOutput "index policy"

https://gerrit.wikimedia.org/r/759837

Change 804310 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] [EXPERIMENTAL] deprecate mutation of ToC in OutputPageParserOutput hook

https://gerrit.wikimedia.org/r/804310

ssastry triaged this task as Medium priority.Jun 14 2022, 10:37 PM
ssastry added a project: Parsoid-Read-Views.

I don't have anything to back it up, but i feel like a really common use case, would be an extension wants to store some key value pairs under its own key as a namespace.

e.g. in the old version, you would do

$metadata = $parser->getExtensionData( 'MyExtension' ) ?: [];
// Assumption here is that either $someKey doesn't exist, or if it does, the extension does not care which write wins.
$metadata[$someKey] = $someValue;
$metadata = $parser->setExtensionData( 'MyExtension' );

The new appendExtensionData doesn't seem to cover that usecase.

I don't have anything to back it up, but i feel like a really common use case, would be an extension wants to store some key value pairs under its own key as a namespace.

e.g. in the old version, you would do

$metadata = $parser->getExtensionData( 'MyExtension' ) ?: [];
// Assumption here is that either $someKey doesn't exist, or if it does, the extension does not care which write wins.
$metadata[$someKey] = $someValue;
$metadata = $parser->setExtensionData( 'MyExtension' );

The new appendExtensionData doesn't seem to cover that usecase.

Yes, you should prefix the namespace. eg:

$parser->setExtensionData( "MyExtension-$someKey", $someValue);

That is a bit more robust than trying to mutate an entry in an array value.

Change #804310 abandoned by C. Scott Ananian:

[mediawiki/core@master] Deprecate mutation of ToC in OutputPageParserOutput hook

Reason:

We ended up removing setTOCHTML(), making this patch moot.

https://gerrit.wikimedia.org/r/804310