Page MenuHomePhabricator

MWException: Bad parser output text. (from ParserOutput::getText)
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

MediaWiki version: 1.36.0-wmf.5

message
MWException: Bad parser output text.
exception.trace
#0 [internal function]: ParserOutput->{closure}(array)
#1 /srv/mediawiki/php-1.36.0-wmf.5/includes/parser/ParserOutput.php(385): preg_replace_callback(string, Closure, string)
#2 /srv/mediawiki/php-1.36.0-wmf.5/includes/OutputPage.php(2004): ParserOutput->getText(array)
#3 /srv/mediawiki/php-1.36.0-wmf.5/includes/OutputPage.php(2017): OutputPage->addParserOutputText(ParserOutput, array)
#4 /srv/mediawiki/php-1.36.0-wmf.5/includes/page/Article.php(844): OutputPage->addParserOutput(ParserOutput, array)
#5 /srv/mediawiki/php-1.36.0-wmf.5/includes/actions/ViewAction.php(74): Article->view()
#6 /srv/mediawiki/php-1.36.0-wmf.5/includes/MediaWiki.php(527): ViewAction->show()
#7 /srv/mediawiki/php-1.36.0-wmf.5/includes/MediaWiki.php(313): MediaWiki->performAction(Article, Title)
#8 /srv/mediawiki/php-1.36.0-wmf.5/includes/MediaWiki.php(940): MediaWiki->performRequest()
#9 /srv/mediawiki/php-1.36.0-wmf.5/includes/MediaWiki.php(543): MediaWiki->main()
#10 /srv/mediawiki/php-1.36.0-wmf.5/index.php(53): MediaWiki->run()
#11 /srv/mediawiki/php-1.36.0-wmf.5/index.php(46): wfIndexMain()
#12 /srv/mediawiki/w/index.php(3): require(string)
#13 {main}

Impact

Notes

Possibly related to T205678: Unable to view certain pages due to Fatal exception: "Bad parser output text"

Details

Request ID
b81eb9de-e1b6-4c6e-8adc-be31694924a7
Request URL
https://diq.wikipedia.org/wiki/Vaten:%C4%B0lm

Event Timeline

Assuming this task is about the mediawiki-parser code project, hence adding that project tag so other people who don't know or don't care about team tags can also find this task when searching via projects.

Krinkle renamed this task from Bad parser output text. to MWException: Bad parser output text. (from ParserOutput::getText).Sep 22 2020, 7:59 PM
Krinkle updated the task description. (Show Details)
Krinkle edited Stack Trace. (Show Details)
Krinkle triaged this task as Unbreak Now! priority.EditedSep 22 2020, 8:02 PM
Krinkle subscribed.

This is not an issue with the parser, but rather with the output logic toward the skin.

I suspect that either the ParserOutput::getText() method has an issue its logic for transforming for the skin (\cc Readers-Web as knowing that area and having worked on it), or that the general parser cache interactions have caused some kind of corruption (\cc Platform Engineering as knowing about that).

Marking as High because I've never seen this before in production and while it might be infrequent it seems scary that this can happen. Would be better to know for sure what caused it so we know that it can't happen again.

Krinkle lowered the priority of this task from Unbreak Now! to High.EditedSep 22 2020, 8:02 PM
Krinkle updated the task description. (Show Details)

Woops, this crash is easily reproducible at https://diq.wikipedia.org/wiki/Vaten:%C4%B0lm.

Restricting task visibility to reduce risk of someone using this as a way to fire alarms or abort deployments at will.

Krinkle changed the visibility from "Public (No Login Required)" to "Custom Policy".Sep 22 2020, 8:06 PM

Woops, this crash is easily reproducible at https://diq.wikipedia.org/wiki/Vaten:%C4%B0lm.

Content is:

#REDIRECT [[Talk:Portal:Zanayış]]

And https://diq.wikipedia.org/wiki/Talk:Portal:Zanay%C4%B1%C5%9F returns:

Bad title
The requested page title refers to a talk page that cannot exist.

And https://diq.wikipedia.org/wiki/Talk:Portal:Zanay%C4%B1%C5%9F returns:

Bad title
The requested page title refers to a talk page that cannot exist.

Yes, https://en.wikipedia.org/wiki/Talk:User:X also currectly returns this. The subject page of Talk:User:X would be User:X, but the talk page of User:X is User_talk:X. There cannot be a page for which the talk page is Talk:User:X. This part works as designed.

However, why does the redirect trigger a fatal error? https://en.wikipedia.org/wiki/User:DKinzler_(WMF)/T261347 doesn't, it's just not a redirect.

This is not an issue with the parser, but rather with the output logic toward the skin.

I'm not sure about that.

ParserOutput::getText() is trying to inject [edit] section links nased on '<mw:editsection>' placeholders emitted by the parser. The exception is caused by Title::newFromText( htmlspecialchars_decode( $m[1] ) ) returning null for an invalid title. That title comes from $editlink = '<mw:editsection page="' . htmlspecialchars( $editsectionPage ) in Parser. $editsectionPage usually comes from $this->getTitle()->getPrefixedText(), which should never be invalid.

Also... how would https://diq.wikipedia.org/wiki/Vaten:%C4%B0lm even trigger edit section links?

Hm, why is the patch not being linked to the ticket? Anyway, here is a patch that makes ParserOutput log an error instead of throwing. This way, we can continue investigating without user impact.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629421

Note:

  • The fatal error happens while the destination of the redirect is attempted to be rendered, not the one in the URL. ("Soft redirects").
  • Viewing the latest revision of the URL directly, via https://diq.wikipedia.org/w/index.php?title=Vaten:%C4%B0lm&oldid=44966 does not produce an error because that is merely asking to render the redirect code itself. And interestingly, that is not producing a clickable link because it is an invalid link target. It instead falls back to the expected plain text rendering.
  • Given this rendering behaviour the question then becomes: How come this was saved in a way that instructed MediaWiki to store a redirect in the database? If you purge this title or save the same wikitext elsewhere, it would not produce a redirect and thus be fine.
  • I suspect the root cause here is that maybe namespaces or localisation thereof changed on the wiki, and we did not run the neccecary scripts such as namespaceDupes.php or some such. Or maybe we did, but the script has a blind spot that is failing to have converted/fixed this?
  • Is checkBadRedirects.php relevant?
  • We can probably tolerate a more graceful failure here since there's an established fallback for this sort of scenario, namely to render it as plain text.

Some things we could do now:

  • Find root cause.
  • Then fix root cause if it's a new issue, or if we just forgot to run scripts, run them.
  • Make this failure more graceful, render as escaped/plain text?
This comment was removed by daniel.
> select * from redirect where rd_from = 8156 \G
*************************** 1. row ***************************
     rd_from: 8156
rd_namespace: 1
    rd_title: Portal:Zanayış
rd_interwiki: 
 rd_fragment: 
1 row in set (0.05 sec)

Indeed, the page is recorded in the database as a redirect. And purging would probably fix the issue. Please don't do that just yet though, having a way to reproduce is useful.

I suppose this redirect was created when "Portal:" was just a prefix in the main namespace, so the talk page would have been "Talk:Portal:Xyz". When the Portal namespace was created, this link was not fixed. Does namespaceDupes even try to detect this case?

What I find strange is the way this confuses the parser. I'll try to re-create it locally.

Hm, why is the patch not being linked to the ticket?

Because this ticket is not public.

Locally creating a page that redirects to Talk:Portal:X, and then creating the Portal namespace and running namespaceDupes indeed leaves the bad row in the database.
However, I can't reproduce the hard failure this way. The page just renders as plain wikitext, as would be expected.

daniel lowered the priority of this task from High to Medium.Sep 24 2020, 7:01 PM

The patches removing the exception have been merged. Lowing prio.

tstarling raised the priority of this task from Medium to High.Feb 23 2021, 11:05 PM

We verified today in clinic duty collab that the linked page still throws an exception.

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-deploy-2021.02.23?id=Nxjn0HcBoqQO90dBtRwu

Postcondition failed: makeTitleSafe() should always return a Title for the text returned by getRootText().
	
from /srv/mediawiki/php-1.36.0-wmf.31/vendor/wikimedia/assert/src/Assert.php(202)
#0 /srv/mediawiki/php-1.36.0-wmf.31/includes/Title.php(2023): Wikimedia\Assert\Assert::postcondition(boolean, string)
#1 /srv/mediawiki/php-1.36.0-wmf.31/extensions/Popups/includes/PopupsContext.php(194): Title->getRootTitle()
#2 /srv/mediawiki/php-1.36.0-wmf.31/extensions/Popups/includes/PopupsHooks.php(100): Popups\PopupsContext->isTitleExcluded(Title)
#3 /srv/mediawiki/php-1.36.0-wmf.31/includes/HookContainer/HookContainer.php(333): Popups\PopupsHooks::onBeforePageDisplay(OutputPage, SkinVector)
#4 /srv/mediawiki/php-1.36.0-wmf.31/includes/HookContainer/HookContainer.php(140): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#5 /srv/mediawiki/php-1.36.0-wmf.31/includes/HookContainer/HookRunner.php(1004): MediaWiki\HookContainer\HookContainer->run(string, array, array)
#6 /srv/mediawiki/php-1.36.0-wmf.31/includes/OutputPage.php(2626): MediaWiki\HookContainer\HookRunner->onBeforePageDisplay(OutputPage, SkinVector)
#7 /srv/mediawiki/php-1.36.0-wmf.31/includes/MediaWiki.php(951): OutputPage->output(boolean)
#8 /srv/mediawiki/php-1.36.0-wmf.31/includes/libs/rdbms/lbfactory/LBFactory.php(608): MediaWiki::{closure}()
#9 /srv/mediawiki/php-1.36.0-wmf.31/includes/libs/rdbms/lbfactory/LBFactory.php(228): Wikimedia\Rdbms\LBFactory->shutdownChronologyProtector(Wikimedia\Rdbms\ChronologyProtector, Closure, string, integer)
#10 /srv/mediawiki/php-1.36.0-wmf.31/includes/MediaWiki.php(702): Wikimedia\Rdbms\LBFactory->shutdown(integer, Closure, integer, NULL)
#11 /srv/mediawiki/php-1.36.0-wmf.31/includes/MediaWiki.php(653): MediaWiki::preOutputCommit(RequestContext, Closure)
#12 /srv/mediawiki/php-1.36.0-wmf.31/includes/MediaWiki.php(960): MediaWiki->doPreOutputCommit(Closure)
#13 /srv/mediawiki/php-1.36.0-wmf.31/includes/MediaWiki.php(548): MediaWiki->main()
#14 /srv/mediawiki/php-1.36.0-wmf.31/index.php(53): MediaWiki->run()
#15 /srv/mediawiki/php-1.36.0-wmf.31/index.php(46): wfIndexMain()
#16 /srv/mediawiki/w/index.php(3): require(string)
#17 {main}

A similar exception is reproducible locally by setting up the redirect table to have an invalid title in it. It's avoidable by using makeTitleSafe() instead of makeTitle() in WikiPage::getRedirectTarget(), which is probably a good idea given the amount of developer effort that goes into tracking these things down. Theoretically the database is supposed to contain valid titles, so you can skip validation. But maybe that's not a realistic goal anymore.

Fix submitted: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/666507

I don't think reproducible exceptions need to be private tasks. It's a nuisance, and the worst case impact is not that bad.

Krinkle changed the visibility from "Custom Policy" to "Public (No Login Required)".

Should the namespaceDupes.php script be fixed so that it does not leave behind bad redirect rows? I suppose the extra pass through the redirect table would make the script much slower, but I'd prefer that to leaving broken entries in the database.

Change 666507 merged by jenkins-bot:
[mediawiki/core@master] Safer WikiPage redirect functions

https://gerrit.wikimedia.org/r/666507

Krinkle assigned this task to tstarling.

In change 629421 the message changed from Bad parser output text to bad title in editsection placeholder and moved from the exception to the Parser channel.

In the past 90 days, neither has been seen in Logstash.