Page MenuHomePhabricator

ParserOutput->hasText() is confusing and probably broken (And hence generate-html => false is probably buggy)
Closed, ResolvedPublicBUG REPORT

Description

Split from T299896

So right now, if you want to do just a metadata parser output with no html generated, you have to construct your parser output as new ParserOutput( null ).

Confusingly, if you do something like this:

$pout = new ParserOutput;
$pout->setText( null );
$pout->hasText(); // This returns true;

you don't get the result you expect. Once a ParserOutput is constructed there is no possible way to mark it as not containing text. Even if you never call ->setText, ->hasText() will still return true. This is despite the docs strongly implying that ->setText(null) is what you are supposed to do to mark the ParserOutput as having no text (However, it should be noted, that in almost every place where generate-html is set to false, the actual code does ->setText( '' ), notwithstanding the docs saying that is wrong).

However (with one exception) mediawiki, never constructs them that way, so in essence ParserOutput->hasText will always return true no matter what you do. This suggests that the "generate-html" => false option is probably subtly broken all throughout core.

I think what we should do here:

  • Make the default value of the first arg to the ParserOutput constructor be null instead of ''
  • Make ParserOutput::setText( null ) actually set the text to be null instead of being a no-op [Is this a breaking API change?]
  • The requirements here should be documented more clearly, namely, that if you are respecting generate-html=> false, you must have parser output text set to null. Many places in code will simply half generate the html, and have setText set to a value that contains half the html.

(I'm not that familiar with MCR, so I would appreciate feedback on if i am missing something here)

As an aside, I also wonder if the generate-html optimization is worth all this complexity, or if in the cases it is used, we eventually need to do a full parse later on in the request, and it causes us to parse the page twice where otherwise we'd be able to reuse the parse.

Event Timeline

Change 785247 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff):

[mediawiki/core@master] Clarify generate-html and make ParserOutput behave as expected

https://gerrit.wikimedia.org/r/785247

Change 785276 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Fix EntityHandlerTestCase::testPageProperties()

https://gerrit.wikimedia.org/r/785276

Change 785282 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Don’t assert text of parser output without HTML

https://gerrit.wikimedia.org/r/785282

Change 785297 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Assert that parser output without HTML has no text

https://gerrit.wikimedia.org/r/785297

Change 785276 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Fix EntityHandlerTestCase::testPageProperties()

https://gerrit.wikimedia.org/r/785276

Change 785282 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Don’t assert text of parser output without HTML

https://gerrit.wikimedia.org/r/785282

Change 785247 merged by jenkins-bot:

[mediawiki/core@master] Clarify generate-html and make ParserOutput behave as expected

https://gerrit.wikimedia.org/r/785247

Change 785297 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Assert that parser output without HTML has no text

https://gerrit.wikimedia.org/r/785297

Bawolff claimed this task.