Page MenuHomePhabricator

ParserOutput should implement ContentMetadataCollector
Open, Needs TriagePublic

Description

(BIKESHED WARNING: Name of class can/has been bikeshedded.)

Parsoid needs to get access to a number of methods of ParserOutput in core, but the full ParserOutput class has too many dependencies on core to cleanly extract.

In particular, Parsoid needs to do a bunch of book-keeping to track page dependencies, so that Parsoid can be used in the post-edit hook to update these dependency tables.

The proposal is to add a new clean interface in Parsoid, something like:

interface ContentMetadataCollector {
function addCategory(...);
function setIndicator(....);
function setTitleText(...);
function addLanguageLink(...);
function addWarning(...);
function addExternalLink(...);
function addLink(...);
function addImage(...);
function addTemplate(...);
// etc
}

and have ParserOutput in core implement this interface:

class ParserOutput extends CacheTime implements ContentMetadataCollector {
 ...
}

Not every method in ParserOutput will be added to ContentMetadataCollector! ContentMetadataCollector is intended to be more-or-less "write-only" and none of the getters will be added to ContentMetadataCollector, nor will some of the non-bookkeeping functions (like ParserOutput::isLinkInternal -- why is that even there?).

Further, all method related to HTML output (::getText(), ::setTOCHTML() etc) are removed. Parsoid represents its output as a DOM (and related metadata) and uses PageBundle for this purpose. The getText methods are a mess, any way.

Some of the setters might need to be tweaked a bit if they use MW-specific classes; for example ParserOutput::addLink takes a Title as an argument, and the version of that in ContentMetadataCollector will need to be changed to take a dbKey string (like addImage does). [EDIT: @Pchelolo suggests some form of LinkTarget class.)

When Parsoid is run in standalone mode it will be provided with an implementation of ContentMetadataCollector (src/Config/Api/StandaloneContentMetadataCollector.php) which basically does nothing on every call . We might eventually extend this to record some of the information in standalone mode, but only to allow us to write better tests. For example, the parser test infrastructure contains a mode where it dumps all the categories as plain text; we might implement ContentMetadataCollector::addCategory() to store the information somewhere to support that mode of parser tests. (Adding a proxy ContentMetadataCollector to record information off to the side might also be a way to compare results of the legacy parser and parsoid in production.)

When Parsoid is run in integrated mode, the ContentMetadataCollector object it will be given will be a "real" ParserOutput. Hooks and extensions which expect to have access to a real ParserOutput will thus have access to one via typecast, as long as they are being run in integrated mode (which by definition they almost certainly are because the hook/extension mechanism is part of integrated mode).

Note 1: Methods will need to be added to ContentMetadataCollector in Parsoid with care to avoid breaking ParserOutput in core. Should be straightforward to commit the new method to core first, then update mediawiki-vendor to reference the latest parsoid with the new method in its ContentMetadataCollector interface. Worst-case we'll have to temporarily rename methods to allow signature changes to be done in multiple steps.

Note 2: There is some limit tracking code in core's Parser class (for example Parser::incrementExpensiveFunctionCount) which properly belongs in ParserOutput/ContentMetadataCollector so that Parsoid can bump the limits as it parses. See also Parser::makeLimitReport. Moving these functions from Parser to ParserOutput can be done as a non-blocking subtask. Once the methods are in ParserOutput then can easily be added to ContentMetadataCollector and then parsoid can call them.

Note 3: https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/623032 is old but was an first step towards this task.

Details

ProjectBranchLines +/-Subject
mediawiki/coremaster+4 -3
mediawiki/coremaster+5 -4
mediawiki/vendormaster+7 K -5 K
mediawiki/services/parsoidmaster+292 -0
mediawiki/extensions/GeoCrumbsmaster+2 -2
mediawiki/extensions/GeoCrumbswmf/1.38.0-wmf.5+2 -2
mediawiki/extensions/LiquidThreadsmaster+2 -2
mediawiki/extensions/Wikibasemaster+3 -3
mediawiki/coremaster+4 -1
mediawiki/extensions/CollaborationKitmaster+2 -2
mediawiki/coremaster+46 -11
mediawiki/coremaster+9 -8
mediawiki/extensions/NoCatmaster+1 -1
mediawiki/extensions/PageInCatmaster+2 -2
mediawiki/extensions/BlueSpiceEchoConnectormaster+2 -2
mediawiki/extensions/Wikibasemaster+42 -42
mediawiki/extensions/WikidataPageBannermaster+4 -4
mediawiki/extensions/NoCatmaster+2 -2
mediawiki/extensions/Scoremaster+3 -3
mediawiki/extensions/LiquidThreadsmaster+3 -3
mediawiki/extensions/Citemaster+3 -3
mediawiki/extensions/TemplateDatamaster+3 -3
mediawiki/extensions/ProofreadPagemaster+2 -2
mediawiki/extensions/PageTriagemaster+2 -2
mediawiki/extensions/MobileFrontendmaster+8 -8
mediawiki/extensions/Kartographermaster+5 -5
mediawiki/extensions/JsonConfigmaster+2 -2
mediawiki/coremaster+1 -0
mediawiki/extensions/WikibaseMediaInfomaster+2 -2
mediawiki/extensions/TimedMediaHandlermaster+2 -2
mediawiki/extensions/Flowmaster+1 -1
mediawiki/coremaster+118 -62
mediawiki/coremaster+136 -48
mediawiki/coremaster+1 -0
mediawiki/coremaster+18 -4
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Opencscott
OpenNone
Resolvedcscott
ResolvedPRODUCTION ERRORLucas_Werkmeister_WMDE
ResolvedPRODUCTION ERRORabi_
ResolvedPRODUCTION ERRORcscott
ResolvedPRODUCTION ERRORroman-stolar
Opencscott
Opencscott
Opencscott
OpenNone
OpenNone
OpenNone
OpenNone
Opencscott
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

To add a nit to the naming bike shed:

  • I quite like RenderedContent. We now have RevisionRenderer and RendereredRevision in core, we also now have ContentRenderer in core, so the one thing missing is RenderedContent :)
  • I quite like the idea of splitting out mutators from accessors. Builder though is a little misused here - usually builder is a separate object that then has a 'build' method to return an immutable result. It feels impossible to make RendererdContent (e.g. ParserOutput) truly immutable, so builder pattern is not really a good fit here. Maybe MutableRenderedContent akin already existing MutableContext? Then we can have RenderedContent be a read-only interface, ParserOutput implement both, and we can split read vs write access by proper typehints.
  • How about making MutableRenderedContent support fluent interface?

::addCacheMessage()
This is marked @internal in core.
Not clear yet whether Parsoid needs this.

No, Parsoid definitely does not. This is used by ParserCache to add a HTML comment like 'cached with key bla and id foo' for debugging.

::addOutputHook()

The feature is only used in one extension. You can register global callbacks that will be run on the ParserOutput before display regardless whether it was parsed or loaded from cache. The extension is not deployed, if you make a subtask I'll try to remove this feature.

::addLink/::addTemplate()/::addInterwikiLink()/::addTrackingCategory()
Takes a Title as a parameter; need alternative using a Parsoid-available type. (eg ::addImage() takes 'Title dbKey' string)

This in core should really take PageIdentity/PageReference. This is something we will have some contention about... Now that we actually have nice types for value objects, we will be very upset not to use them...

Change 724846 merged by jenkins-bot:

[mediawiki/core@master] ParserOutput::isLinkInternal(): make @internal

https://gerrit.wikimedia.org/r/724846

Change 724847 merged by jenkins-bot:

[mediawiki/core@master] Rename ParserOutput::hideNewSection() -> ::setHideNewSection()

https://gerrit.wikimedia.org/r/724847

To add a nit to the naming bike shed:

  • I quite like RenderedContent. We now have RevisionRenderer and RendereredRevision in core, we also now have ContentRenderer in core, so the one thing missing is RenderedContent :)
  • I quite like the idea of splitting out mutators from accessors. Builder though is a little misused here - usually builder is a separate object that then has a 'build' method to return an immutable result. It feels impossible to make RendererdContent (e.g. ParserOutput) truly immutable, so builder pattern is not really a good fit here. Maybe MutableRenderedContent akin already existing MutableContext? Then we can have RenderedContent be a read-only interface, ParserOutput implement both, and we can split read vs write access by proper typehints.
  • How about making MutableRenderedContent support fluent interface?

Agreed Builder isn't quite the right thing -- well, not if ParserOutput is going to directly implement it. If ParserOutputBuilder had those methods, and then had a ::getResult() method that would return a ParserOutput, that would be more traditionally a 'Builder' class.

I'm not a big fan of 'Mutable' here, because I think the thing I want to emphase is that this is more like recording the properties of the rendered content, not 'mutating them'. That is, it's not that we render the content then go back and tweak or postprocess the links or something like that, which is what I think 'mutable' seems to suggest.

RenderedContent is a good phrase, although I think for the moment I'd deliberately leaving *out* the actual Content part of this, because getText() is a mass of hair and we already have reasonable Content interfaces, which (I think) allow me to access the content in various forms. This is more like a "information-about-content-extractor" or "content analyzer" or "analyzed content recorder". Hm.

::addCacheMessage()
This is marked @internal in core.
Not clear yet whether Parsoid needs this.

No, Parsoid definitely does not. This is used by ParserCache to add a HTML comment like 'cached with key bla and id foo' for debugging.

::addOutputHook()

The feature is only used in one extension. You can register global callbacks that will be run on the ParserOutput before display regardless whether it was parsed or loaded from cache. The extension is not deployed, if you make a subtask I'll try to remove this feature.

Created T292321: Remove ParserOutput::addOutputHook(), thanks.

::addLink/::addTemplate()/::addInterwikiLink()/::addTrackingCategory()
Takes a Title as a parameter; need alternative using a Parsoid-available type. (eg ::addImage() takes 'Title dbKey' string)

This in core should really take PageIdentity/PageReference. This is something we will have some contention about... Now that we actually have nice types for value objects, we will be very upset not to use them...

We discussed this in slack, I think we agreed that LinkTarget was probably the right thing here.

Change 724848 merged by jenkins-bot:

[mediawiki/core@master] Rename ParserOutput::{allow,prevent}Clickjacking() -> ::{get,set}PreventClickjacking()

https://gerrit.wikimedia.org/r/724848

Change 727401 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Rename ParserOutput::{get,set,unset}Property to {get,set,unset}PageProperty

https://gerrit.wikimedia.org/r/727401

Change 727402 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Hard deprecate the renamed ParserOutput::*Property() methods

https://gerrit.wikimedia.org/r/727402

Change 727403 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Wikibase@master] Rename usages of deprecated ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727403

Change 727610 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/JsonConfig@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727610

Change 727612 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Kartographer@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727612

Change 727633 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/MobileFrontend@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727633

Change 727634 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/PageTriage@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727634

Change 727635 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/ProofreadPage@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727635

Change 727636 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/TemplateData@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727636

Change 728464 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Flow@master] Replace use of deprecated OutputPage::preventClickjacking()

https://gerrit.wikimedia.org/r/728464

Change 728467 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/TimedMediaHandler@master] Replace use of deprecated OutputPage::allowClickjacking()

https://gerrit.wikimedia.org/r/728467

Change 728469 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/WikibaseMediaInfo@master] Replace use of deprecated OutputPage::preventClickjacking()

https://gerrit.wikimedia.org/r/728469

Change 728470 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Hard-deprecate ParserOutput::preventClickjacking()

https://gerrit.wikimedia.org/r/728470

Change 728476 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Cite@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728476

Change 728481 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/LiquidThreads@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728481

Change 728483 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Score@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728483

Change 728486 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/WikidataPageBanner@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728486

Change 727401 merged by jenkins-bot:

[mediawiki/core@master] Rename ParserOutput::{get,set,unset}Property to {get,set,unset}PageProperty

https://gerrit.wikimedia.org/r/727401

Change 728491 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Rename ParserOutput::setCategoryLinks() to ::setCategories()

https://gerrit.wikimedia.org/r/728491

Change 727610 merged by jenkins-bot:

[mediawiki/extensions/JsonConfig@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727610

Change 727634 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727634

Change 727635 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727635

Change 728464 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] Replace use of deprecated OutputPage::preventClickjacking()

https://gerrit.wikimedia.org/r/728464

Change 728481 merged by jenkins-bot:

[mediawiki/extensions/LiquidThreads@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728481

Change 728500 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/BlueSpiceEchoConnector@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728500

Change 728502 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/NoCat@master] Rename deprecated ParserOutput::setCategoryLinks()

https://gerrit.wikimedia.org/r/728502

Change 728503 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/PageInCat@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728503

Change 728504 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Hard deprecate ParserOutput::{get,set}CategoryLinks()

https://gerrit.wikimedia.org/r/728504

Change 728476 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728476

Change 727612 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727612

Change 728483 merged by jenkins-bot:

[mediawiki/extensions/Score@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728483

Change 727633 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727633

Change 728467 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Replace use of deprecated OutputPage::allowClickjacking()

https://gerrit.wikimedia.org/r/728467

Change 728469 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Replace use of deprecated OutputPage::preventClickjacking()

https://gerrit.wikimedia.org/r/728469

Change 727636 merged by jenkins-bot:

[mediawiki/extensions/TemplateData@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727636

cscott renamed this task from ParserOutput should implement ContentOutputBuilder to ParserOutput should implement ContentMetadataCollector.Oct 8 2021, 6:12 PM
cscott updated the task description. (Show Details)

Change 728665 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/NoCat@master] Rename deprecated use of ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/728665

Change 728470 merged by jenkins-bot:

[mediawiki/core@master] Hard-deprecate ParserOutput::preventClickjacking()

https://gerrit.wikimedia.org/r/728470

Change 728665 merged by jenkins-bot:

[mediawiki/extensions/NoCat@master] Rename deprecated use of ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/728665

Change 728486 merged by jenkins-bot:

[mediawiki/extensions/WikidataPageBanner@master] Rename deprecated usage of ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/728486

Change 727403 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Rename usages of deprecated ParserOutput::{get,set}Property()

https://gerrit.wikimedia.org/r/727403

Change 727402 merged by jenkins-bot:

[mediawiki/core@master] Hard deprecate the renamed ParserOutput::*Property() methods

https://gerrit.wikimedia.org/r/727402

Change 728491 merged by jenkins-bot:

[mediawiki/core@master] Rename ParserOutput::setCategoryLinks() and ::getCategoryLinks()

https://gerrit.wikimedia.org/r/728491

Change 728504 merged by jenkins-bot:

[mediawiki/core@master] Hard deprecate ParserOutput::{get,set}CategoryLinks()

https://gerrit.wikimedia.org/r/728504

Change 728500 merged by jenkins-bot:

[mediawiki/extensions/BlueSpiceEchoConnector@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728500

Change 728503 merged by jenkins-bot:

[mediawiki/extensions/PageInCat@master] Rename deprecated ParserOutput::getCategoryLinks()

https://gerrit.wikimedia.org/r/728503

Change 728502 merged by jenkins-bot:

[mediawiki/extensions/NoCat@master] Rename deprecated use of ParserOutput::setCategoryLinks()

https://gerrit.wikimedia.org/r/728502

Change 731180 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/CollaborationKit@master] Rename usage of deprecated ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/731180

Change 731180 merged by jenkins-bot:

[mediawiki/extensions/CollaborationKit@master] Rename usage of deprecated ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/731180

Change 732354 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Wikibase@master] Rename usages of deprecated ParserOutput::{get,set}Property() (take 2)

https://gerrit.wikimedia.org/r/732354

Change 732357 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/LiquidThreads@master] Rename use of deprecated ParserOutput::getProperty()

https://gerrit.wikimedia.org/r/732357

Change 732354 abandoned by C. Scott Ananian:

[mediawiki/extensions/Wikibase@master] Rename usages of deprecated ParserOutput::{get,set}Property() (take 2)

Reason:

Abandon in favor of I70706415ae657f6783b7ae78e2dbe1b96ca31dbe

https://gerrit.wikimedia.org/r/732354

Change 732357 abandoned by C. Scott Ananian:

[mediawiki/extensions/LiquidThreads@master] Rename use of deprecated ParserOutput::getProperty()

Reason:

Abandon in favor of I78b15a6e12c462eb7f3c75b2549801e8324bc591

https://gerrit.wikimedia.org/r/732357

Change 732360 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/GeoCrumbs@master] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732360

Change 732333 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/GeoCrumbs@wmf/1.38.0-wmf.5] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732333

Change 732333 merged by jenkins-bot:

[mediawiki/extensions/GeoCrumbs@wmf/1.38.0-wmf.5] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732333

Change 732360 merged by jenkins-bot:

[mediawiki/extensions/GeoCrumbs@master] Replace use of deprecated ParserOutput:getProperty()

https://gerrit.wikimedia.org/r/732360

Change 726944 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Add ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/726944

Change 726944 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/726944

Change 742570 had a related patch set uploaded (by Sbailey; author: Sbailey):

[mediawiki/vendor@master] Bump Parsoid to 0.15.0-a11

https://gerrit.wikimedia.org/r/742570

Change 742570 merged by jenkins-bot:

[mediawiki/vendor@master] Bump Parsoid to 0.15.0-a11

https://gerrit.wikimedia.org/r/742570

Change 748830 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput now extends Parsoid's ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/748830

Change 752230 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutput: implement the abstract ContentMetadataCollector interface

https://gerrit.wikimedia.org/r/752230

Change 748830 abandoned by C. Scott Ananian:

[mediawiki/core@master] ParserOutput now implements Parsoid's ContentMetadataCollector interface

Reason:

Duplicate of I15c0e81185b9957fe097c82e6609a200742ee7d1

https://gerrit.wikimedia.org/r/748830