Page MenuHomePhabricator

ParserOutput should implement ContentMetadataCollector
Closed, ResolvedPublic

Description

(BIKESHED WARNING: Name of class can/has been bikeshedded.)

Parsoid needs to get access to a number of methods of ParserOutput in core, but the full ParserOutput class has too many dependencies on core to cleanly extract.

In particular, Parsoid needs to do a bunch of book-keeping to track page dependencies, so that Parsoid can be used in the post-edit hook to update these dependency tables.

The proposal is to add a new clean interface in Parsoid, something like:

interface ContentMetadataCollector {
function addCategory(...);
function setIndicator(....);
function setTitleText(...);
function addLanguageLink(...);
function addWarning(...);
function addExternalLink(...);
function addLink(...);
function addImage(...);
function addTemplate(...);
// etc
}

and have ParserOutput in core implement this interface:

class ParserOutput extends CacheTime implements ContentMetadataCollector {
 ...
}

Not every method in ParserOutput will be added to ContentMetadataCollector! ContentMetadataCollector is intended to be more-or-less "write-only" and none of the getters will be added to ContentMetadataCollector, nor will some of the non-bookkeeping functions (like ParserOutput::isLinkInternal -- why is that even there?).

Further, all method related to HTML output (::getText(), ::setTOCHTML() etc) are removed. Parsoid represents its output as a DOM (and related metadata) and uses PageBundle for this purpose. The getText methods are a mess, any way.

Some of the setters might need to be tweaked a bit if they use MW-specific classes; for example ParserOutput::addLink takes a Title as an argument, and the version of that in ContentMetadataCollector will need to be changed to take a dbKey string (like addImage does). [EDIT: @Pchelolo suggests some form of LinkTarget class.)

When Parsoid is run in standalone mode it will be provided with an implementation of ContentMetadataCollector (src/Config/Api/StandaloneContentMetadataCollector.php) which basically does nothing on every call . We might eventually extend this to record some of the information in standalone mode, but only to allow us to write better tests. For example, the parser test infrastructure contains a mode where it dumps all the categories as plain text; we might implement ContentMetadataCollector::addCategory() to store the information somewhere to support that mode of parser tests. (Adding a proxy ContentMetadataCollector to record information off to the side might also be a way to compare results of the legacy parser and parsoid in production.)

When Parsoid is run in integrated mode, the ContentMetadataCollector object it will be given will be a "real" ParserOutput. Hooks and extensions which expect to have access to a real ParserOutput will thus have access to one via typecast, as long as they are being run in integrated mode (which by definition they almost certainly are because the hook/extension mechanism is part of integrated mode).

Note 1: Methods will need to be added to ContentMetadataCollector in Parsoid with care to avoid breaking ParserOutput in core. Should be straightforward to commit the new method to core first, then update mediawiki-vendor to reference the latest parsoid with the new method in its ContentMetadataCollector interface. Worst-case we'll have to temporarily rename methods to allow signature changes to be done in multiple steps.

Note 2: There is some limit tracking code in core's Parser class (for example Parser::incrementExpensiveFunctionCount) which properly belongs in ParserOutput/ContentMetadataCollector so that Parsoid can bump the limits as it parses. See also Parser::makeLimitReport. Moving these functions from Parser to ParserOutput can be done as a non-blocking subtask. Once the methods are in ParserOutput then can easily be added to ContentMetadataCollector and then parsoid can call them.

Note 3: https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/623032 is old but was an first step towards this task.

Details

SubjectRepoBranchLines +/-
mediawiki/coremaster+7 -2
mediawiki/coremaster+5 -4
mediawiki/vendormaster+7 K -5 K
mediawiki/services/parsoidmaster+292 -0
mediawiki/extensions/GeoCrumbsmaster+2 -2
mediawiki/extensions/GeoCrumbswmf/1.38.0-wmf.5+2 -2
mediawiki/extensions/LiquidThreadsmaster+2 -2
mediawiki/extensions/Wikibasemaster+3 -3
mediawiki/coremaster+4 -1
mediawiki/extensions/CollaborationKitmaster+2 -2
mediawiki/coremaster+46 -11
mediawiki/coremaster+9 -8
mediawiki/extensions/NoCatmaster+1 -1
mediawiki/extensions/PageInCatmaster+2 -2
mediawiki/extensions/BlueSpiceEchoConnectormaster+2 -2
mediawiki/extensions/Wikibasemaster+42 -42
mediawiki/extensions/WikidataPageBannermaster+4 -4
mediawiki/extensions/NoCatmaster+2 -2
mediawiki/extensions/Scoremaster+3 -3
mediawiki/extensions/LiquidThreadsmaster+3 -3
mediawiki/extensions/Citemaster+3 -3
mediawiki/extensions/TemplateDatamaster+3 -3
mediawiki/extensions/ProofreadPagemaster+2 -2
mediawiki/extensions/PageTriagemaster+2 -2
mediawiki/extensions/MobileFrontendmaster+8 -8
mediawiki/extensions/Kartographermaster+5 -5
mediawiki/extensions/JsonConfigmaster+2 -2
mediawiki/coremaster+1 -0
mediawiki/extensions/WikibaseMediaInfomaster+2 -2
mediawiki/extensions/TimedMediaHandlermaster+2 -2
mediawiki/extensions/Flowmaster+1 -1
mediawiki/coremaster+118 -62
mediawiki/coremaster+136 -48
mediawiki/coremaster+1 -0
mediawiki/coremaster+18 -4
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
Resolvedssastry
StalledNone
Resolvedssastry
OpenNone
Resolvedcscott
Resolvedcscott
ResolvedPRODUCTION ERRORLucas_Werkmeister_WMDE
ResolvedPRODUCTION ERRORabi_
ResolvedPRODUCTION ERRORcscott
ResolvedPRODUCTION ERRORroman-stolar
ResolvedNone
ResolvedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 728483 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Score@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728483

Change 728486 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/WikidataPageBanner@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728486

Change 727401 merged by jenkins-bot:

[mediawiki/core@master] Rename ParserOutput::{get,set,unset}Property to {get,set,unset}PageProperty

https://gerrit.wikimedia.org/r/727401

Change 728491 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Rename ParserOutput::setCategoryLinks() to ::setCategories()

https://gerrit.wikimedia.org/r/728491

Change 727610 merged by jenkins-bot:

[mediawiki/extensions/JsonConfig@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727610

Change 727634 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727634

Change 727635 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727635

Change 728464 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] Replace use of deprecated OutputPage::preventClickjacking()

https://gerrit.wikimedia.org/r/728464

Change 728481 merged by jenkins-bot:

[mediawiki/extensions/LiquidThreads@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728481

Change 728500 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/BlueSpiceEchoConnector@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728500

Change 728502 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/NoCat@master] Rename deprecated ParserOutput::setCategoryLinks()

https://gerrit.wikimedia.org/r/728502

Change 728503 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/PageInCat@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728503

Change 728504 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Hard deprecate ParserOutput::{get,set}CategoryLinks()

https://gerrit.wikimedia.org/r/728504

Change 728476 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728476

Change 727612 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727612

Change 728483 merged by jenkins-bot:

[mediawiki/extensions/Score@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728483

Change 727633 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727633

Change 728467 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Replace use of deprecated OutputPage::allowClickjacking()

https://gerrit.wikimedia.org/r/728467

Change 728469 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Replace use of deprecated OutputPage::preventClickjacking()

https://gerrit.wikimedia.org/r/728469

Change 727636 merged by jenkins-bot:

[mediawiki/extensions/TemplateData@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727636

cscott renamed this task from ParserOutput should implement ContentOutputBuilder to ParserOutput should implement ContentMetadataCollector.Oct 8 2021, 6:12 PM
cscott updated the task description. (Show Details)

Change 728665 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/NoCat@master] Rename deprecated use of ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/728665

Change 728470 merged by jenkins-bot:

[mediawiki/core@master] Hard-deprecate ParserOutput::preventClickjacking()

https://gerrit.wikimedia.org/r/728470

Change 728665 merged by jenkins-bot:

[mediawiki/extensions/NoCat@master] Rename deprecated use of ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/728665

Change 728486 merged by jenkins-bot:

[mediawiki/extensions/WikidataPageBanner@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728486

Change 727403 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Rename usages of deprecated ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727403

Change 727402 merged by jenkins-bot:

[mediawiki/core@master] Hard deprecate the renamed ParserOutput::*Property() methods

https://gerrit.wikimedia.org/r/727402

Change 728491 merged by jenkins-bot:

[mediawiki/core@master] Rename ParserOutput::setCategoryLinks() and ::getCategoryLinks()

https://gerrit.wikimedia.org/r/728491

Change 728504 merged by jenkins-bot:

[mediawiki/core@master] Hard deprecate ParserOutput::{get,set}CategoryLinks()

https://gerrit.wikimedia.org/r/728504

Change 728500 merged by jenkins-bot:

[mediawiki/extensions/BlueSpiceEchoConnector@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728500

Change 728503 merged by jenkins-bot:

[mediawiki/extensions/PageInCat@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728503

Change 728502 merged by jenkins-bot:

[mediawiki/extensions/NoCat@master] Rename deprecated use of ParserOutput::setCategoryLinks()

https://gerrit.wikimedia.org/r/728502

Change 731180 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/CollaborationKit@master] Rename usage of deprecated ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/731180

Change 731180 merged by jenkins-bot:

[mediawiki/extensions/CollaborationKit@master] Rename usage of deprecated ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/731180

Change 732354 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Wikibase@master] Rename usages of deprecated ParserOutput::{get,set}Property() (take 2)

https://gerrit.wikimedia.org/r/732354

Change 732357 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/LiquidThreads@master] Rename use of deprecated ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/732357

Change 732354 abandoned by C. Scott Ananian:

[mediawiki/extensions/Wikibase@master] Rename usages of deprecated ParserOutput::{get,set}Property() (take 2)

Reason:

Abandon in favor of I70706415ae657f6783b7ae78e2dbe1b96ca31dbe

https://gerrit.wikimedia.org/r/732354

Change 732357 abandoned by C. Scott Ananian:

[mediawiki/extensions/LiquidThreads@master] Rename use of deprecated ParserOutput::getProperty()

Reason:

Abandon in favor of I78b15a6e12c462eb7f3c75b2549801e8324bc591

https://gerrit.wikimedia.org/r/732357

Change 732360 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/GeoCrumbs@master] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732360

Change 732333 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/GeoCrumbs@wmf/1.38.0-wmf.5] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732333

Change 732333 merged by jenkins-bot:

[mediawiki/extensions/GeoCrumbs@wmf/1.38.0-wmf.5] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732333

Change 732360 merged by jenkins-bot:

[mediawiki/extensions/GeoCrumbs@master] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732360

Change 726944 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Add ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/726944

Change 726944 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/726944

Change 742570 had a related patch set uploaded (by Sbailey; author: Sbailey):

[mediawiki/vendor@master] Bump Parsoid to 0.15.0-a11

https://gerrit.wikimedia.org/r/742570

Change 742570 merged by jenkins-bot:

[mediawiki/vendor@master] Bump Parsoid to 0.15.0-a11

https://gerrit.wikimedia.org/r/742570

Change 748830 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput now extends Parsoid's ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/748830

Change 752230 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput: implement the abstract ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/752230

Change 748830 abandoned by C. Scott Ananian:

[mediawiki/core@master] ParserOutput now implements Parsoid's ContentMetadataCollector interface

Reason:

Duplicate of I15c0e81185b9957fe097c82e6609a200742ee7d1

https://gerrit.wikimedia.org/r/748830

Change 757905 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Deprecate ParserOutput::setPageProperty with conflicting values

https://gerrit.wikimedia.org/r/757905

Worth noting that displaytitle and defaultsort aka {{DISPLAYTITLE}} and {{DEFAULTSORT}} are maintained as page properties. In addition to warning about multiple use on a page, we might also want to eventually maintain the conflicting values as a set, and arbitrarily choose the lexicographically first at page output time. That provides an async-safe composition mechanism, while being identical to current behavior in all pages which *don't* attempt to set conflicting values.

Change 752230 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput: implement the abstract ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/752230

cscott closed this task as Resolved.EditedMar 31 2022, 3:04 PM
cscott claimed this task.

Closing this task since ParserOutput does implement CMC now, although a number of subtasks are still open covering various improvements to the CMC interface. Opened T305159: Improve coverage of the ContentMetadataCollector interface for those.