Page MenuHomePhabricator

TypeError: Argument 4 passed to Wikimedia\Parsoid\Utils\Title::__construct() must be of the type string, null given, called in /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Utils/Title.php on line 392
Open, HighPublicPRODUCTION ERROR

Description

Error
labels.normalized_message
[{reqId}] {exception_url}   TypeError: Argument 4 passed to Wikimedia\Parsoid\Utils\Title::__construct() must be of the type string, null given, called in /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Utils/Title.php on line 392
error.stack_trace
from /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Utils/Title.php(40)
#0 /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Utils/Title.php(392): Wikimedia\Parsoid\Utils\Title->__construct(string, string, integer, NULL, string)
#1 /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Wt2Html/PageConfigFrame.php(29): Wikimedia\Parsoid\Utils\Title::newFromLinkTarget(MediaWiki\Title\Title, MediaWiki\Parser\Parsoid\Config\SiteConfig)
#2 /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Config/Env.php(291): Wikimedia\Parsoid\Wt2Html\PageConfigFrame->__construct(Wikimedia\Parsoid\Config\Env, MediaWiki\Parser\Parsoid\Config\PageConfig, MediaWiki\Parser\Parsoid\Config\SiteConfig)
#3 /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Parsoid.php(179): Wikimedia\Parsoid\Config\Env->__construct(MediaWiki\Parser\Parsoid\Config\SiteConfig, MediaWiki\Parser\Parsoid\Config\PageConfig, MediaWiki\Parser\Parsoid\Config\DataAccess, MediaWiki\Parser\ParserOutput, array)
#4 /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Parsoid.php(232): Wikimedia\Parsoid\Parsoid->parseWikitext(MediaWiki\Parser\Parsoid\Config\PageConfig, MediaWiki\Parser\ParserOutput, array)
#5 /srv/mediawiki/php-1.42.0-wmf.15/includes/parser/Parsoid/ParsoidParser.php(152): Wikimedia\Parsoid\Parsoid->wikitext2html(MediaWiki\Parser\Parsoid\Config\PageConfig, array, NULL, MediaWiki\Parser\ParserOutput)
#6 /srv/mediawiki/php-1.42.0-wmf.15/includes/parser/Parsoid/ParsoidParser.php(260): MediaWiki\Parser\Parsoid\ParsoidParser->genParserOutput(MediaWiki\Parser\Parsoid\Config\PageConfig, ParserOptions)
#7 /srv/mediawiki/php-1.42.0-wmf.15/includes/content/WikitextContentHandler.php(397): MediaWiki\Parser\Parsoid\ParsoidParser->parse(string, MediaWiki\Title\Title, ParserOptions, boolean, boolean, integer)
#8 /srv/mediawiki/php-1.42.0-wmf.15/includes/content/ContentHandler.php(1683): WikitextContentHandler->fillParserOutput(WikitextContent, MediaWiki\Content\Renderer\ContentParseParams, MediaWiki\Parser\ParserOutput)
#9 /srv/mediawiki/php-1.42.0-wmf.15/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput(WikitextContent, MediaWiki\Content\Renderer\ContentParseParams)
#10 /srv/mediawiki/php-1.42.0-wmf.15/includes/Revision/RenderedRevision.php(260): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(WikitextContent, MediaWiki\Title\Title, integer, ParserOptions, boolean)
#11 /srv/mediawiki/php-1.42.0-wmf.15/includes/Revision/RenderedRevision.php(232): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(WikitextContent, boolean)
#12 /srv/mediawiki/php-1.42.0-wmf.15/includes/Revision/RevisionRenderer.php(226): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string, array)
#13 /srv/mediawiki/php-1.42.0-wmf.15/includes/Revision/RevisionRenderer.php(164): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, ParserOptions, array)
#14 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#15 /srv/mediawiki/php-1.42.0-wmf.15/includes/Revision/RenderedRevision.php(199): call_user_func(Closure, MediaWiki\Revision\RenderedRevision, array)
#16 /srv/mediawiki/php-1.42.0-wmf.15/includes/poolcounter/PoolWorkArticleView.php(87): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#17 /srv/mediawiki/php-1.42.0-wmf.15/includes/poolcounter/PoolWorkArticleViewCurrent.php(110): MediaWiki\PoolCounter\PoolWorkArticleView->renderRevision()
#18 /srv/mediawiki/php-1.42.0-wmf.15/includes/poolcounter/PoolCounterWork.php(172): MediaWiki\PoolCounter\PoolWorkArticleViewCurrent->doWork()
#19 /srv/mediawiki/php-1.42.0-wmf.15/includes/page/ParserOutputAccess.php(307): MediaWiki\PoolCounter\PoolCounterWork->execute()
#20 /srv/mediawiki/php-1.42.0-wmf.15/includes/parser/Parsoid/ParsoidOutputAccess.php(197): MediaWiki\Page\ParserOutputAccess->getParserOutput(MediaWiki\Page\PageStoreRecord, ParserOptions, MediaWiki\Revision\RevisionStoreRecord, integer)
#21 /srv/mediawiki/php-1.42.0-wmf.15/includes/jobqueue/jobs/ParsoidCachePrewarmJob.php(138): MediaWiki\Parser\Parsoid\ParsoidOutputAccess->getParserOutput(MediaWiki\Page\PageStoreRecord, ParserOptions, MediaWiki\Revision\RevisionStoreRecord, integer)
#22 /srv/mediawiki/php-1.42.0-wmf.15/includes/jobqueue/jobs/ParsoidCachePrewarmJob.php(150): ParsoidCachePrewarmJob->doParsoidCacheUpdate()
#23 /srv/mediawiki/php-1.42.0-wmf.15/extensions/EventBus/includes/JobExecutor.php(80): ParsoidCachePrewarmJob->run()
#24 /srv/mediawiki/rpc/RunSingleJob.php(60): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#25 {main}
Impact

Real impact unknown. This is triggering MediaWiki error rate alerts.

Notes

Seems to have been introduced in wmf.15.

Details

MediaWiki Version
1.42.0-wmf.15
Request URL
https://mw-jobrunner.discovery.wmnet/rpc/RunSingleJob.php

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptJan 28 2024, 9:56 PM
hnowlan added a project: serviceops.
hnowlan subscribed.

A surge in this error started around 0700 on the 27th of January, and seems to only occur in volume upon hewikisource (there was a brief spike on itwikivoyage on the 26th also).

Raising to high as this is consistently causing alerts on the mw-jobrunner cluster.

From debugging:

From a quick look at the errors in the logs there are 2 things that came up

  • Failures are mostly caused in jobrunner from jobs triggered by parsoidCachePrewarm events
  • The high rate of errors is caused because change prop is retrying those failures
  • By querying the reqid the URI that triggers the root failure is related to bad titles

Also this looks like a similar issue, the titles failing have same prefix (Special:Badtitle/NS100).
For example from logs.

ืžื™ื•ื—ื“:Badtitle/NS100:ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’

Also from the stack trace the argument that is null causing the error is the namespace.

Also this looks like a similar issue, the titles failing have same prefix (Special:Badtitle/NS100).

A placeholder title like Special:Badtitle/NS100 will show up when we try to instantiate A Title with an unknown namespace ID. This would happen e.g. when the extension defining a namespace got disabled, but there are still references to that namespace in the database. This situation is bound to lead to interesting edge cases all over the place.

Wikimedia\Parsoid\Utils\Title::newFromLinkTarget should be more defensive, and should work with a "bad title". That would prevent the TypeError. It wen't fix the underlying problem though.

The root cause seems to be that a namespace was removed from hewikisource (two years ago?!): https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/752634, see T298430: hewikisource - remove "ืงื˜ืข" namespace.

It seems like we are trying to run jobs for reparsing pages in a namespace that no longer exists... or at least, we are reparsing pages that in some way make use of a reference to a non-existing namespace ID.

After digging a bit more on the logs it looks like there are some references from changeprop for failing requests:
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-k8s-1-7.0.0-1-2024.01.24?id=6PjcOY0BW1B7XSkmsiPX

The actual event that triggered the changeprop request has a title thats not correct:

{"$schema":"/resource_change/1.0.0","meta":{"uri":"https://it.wikivoyage.org/wiki/Speciale:Badtitle/NS105:Cammino_di_Oropa","request_id":"6c0e3657-8a97-46be-8507-b5f96010a6f9","id":"3850ee7d-9992-4de1-b04f-abf4c2caa7bd","dt":"2024-01-24T05:06:21Z","domain":"it.wikivoyage.org","stream":"resource_change"},"tags":["purge"]}

So the input to the jobrunner is not correct to start with. its not necessarily a problem with title resolution as shown in the stacktrace.

I think this is a better representation of logs of the lifecycle for a given failing request:
https://logstash.wikimedia.org/goto/5dce2d82ccf9d4ca0ceab11c64884efa

I think this is a better representation of logs of the lifecycle for a given failing request:
https://logstash.wikimedia.org/goto/5dce2d82ccf9d4ca0ceab11c64884efa

The url of the oldest log entries at the bottom is: /w/api.php?action=purge&format=xml... is this coming from some external script?

The same doesn't apply though for hewikisource.
Here is a similar query for the hewikisource first error:
https://logstash.wikimedia.org/goto/150ade8bd1f0109d931020d48bb9887f

Wikimedia\Parsoid\Utils\Title::newFromLinkTarget should be more defensive, and should work with a "bad title". That would prevent the TypeError. It wen't fix the underlying problem though.

I'm not entirely sure I agree with this, as it would make us completely blind as to the underlying problem here. Most title constructors have a "throw exception" version and a "returns null" version -- the legacy parser is a little unusual in that it substitutes Special:BadTitle instead of throwing, but that's because of old legacy compatibility constraints and not something to be emulated in new code IMO.

I'm not entirely sure I agree with this, as it would make us completely blind as to the underlying problem here. Most title constructors have a "throw exception" version and a "returns null" version -- the legacy parser is a little unusual in that it substitutes Special:BadTitle instead of throwing, but that's because of old legacy compatibility constraints and not something to be emulated in new code IMO.

I'm happy with throwing, but it would be nice to throw a more specific exception with a more informative message.

This appears to have trailed off around 1400 on the 29th, but if there is a risk of this recurring it'd be great if we could avoid these exception spikes in future.

I'm able to reproduce this issue on scandium with

NO_PROXY="" no_proxy="" curl --proxy scandium.eqiad.wmnet:80 "http://he.wikisource.org/w/rest.php/he.wikisource.org/v3/page/html/ืžืฆื•ื“ื•ืช_ืขืœ_ืžืœื›ื™ื_ื‘_ื_ื™ื’/1211576"

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-default-1-7.0.0-1-2024.01.30?id=IMUDXI0BySCoT0gdPgIw

Looking at the trace in the description above, the crash is in Parsoid, yes, but the issue is that a MediaWiki\Title\Title is constructed with a namespace number that Language::getNamespaces doesn't know about.

The title is created from Title::makeTitle from Title::newFromPageReference, so it gets the namespace from the page reference, which comes from a MediaWiki\Revision\RevisionStoreRecord. It's coming from the database.

I downloaded a dump of hewikisource and here's the relevant output of grep -A 10 -B 10 "ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’" hewikisource-20240120-pages-meta-current.xml

  <page>
    <title>ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’</title>
    <ns>0</ns>
    <id>138066</id>
    <revision>
      <id>1276519</id>
      <parentid>1211575</parentid>
      <timestamp>2021-10-02T21:26:43Z</timestamp>
      <contributor>
        <username>ืžื•ืฉืš ื‘ืฉื‘ื˜ ื‘ื•ื˜</username>
        <id>27882</id>
      </contributor>
--
        <id>27882</id>
      </contributor>
      <comment>ืžื•ืฉืš ื‘ืฉื‘ื˜ ื‘ื•ื˜ ื”ืขื‘ื™ืจ ืืช ื”ื“ืฃ [[ืงื˜ืข:ืจืžื‘&quot;ืŸ ืขืœ ื‘ืจืืฉื™ืช ื™ื“ ื˜ื•]] ืœืฉื [[ืจืžื‘&quot;ืŸ ืขืœ ื‘ืจืืฉื™ืช ื™ื“ ื˜ื•]]: ื”ืขื‘ืจื” ื‘ืขืงื‘ื•ืช ื‘ื™ื˜ื•ืœ ืžืจื—ื‘ ื”ืฉื &quot;ืงื˜ืข&quot;</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text bytes="53" xml:space="preserve">#ื”ืคื ื™ื” [[ืจืžื‘&quot;ืŸ ืขืœ ื‘ืจืืฉื™ืช ื™ื“ ื˜ื•]]</text>
      <sha1>ldefn2yf71demjfj3wju7eevp03yp65</sha1>
    </revision>
  </page>
  <page>
    <title>ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’</title>
    <ns>100</ns>
    <id>425300</id>
    <redirect title="ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’" />
    <revision>
      <id>1211576</id>
      <timestamp>2021-09-26T17:49:49Z</timestamp>
      <contributor>
        <username>ShalomOrobot</username>
        <id>28707</id>
      </contributor>
      <comment>ShalomOrobot ื”ืขื‘ื™ืจ ืืช ื”ื“ืฃ [[ืงื˜ืข:ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’]] ืœืฉื [[ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’]]: ื”ืขื‘ืจื” ื‘ืขืงื‘ื•ืช ื‘ื™ื˜ื•ืœ ืžืจื—ื‘ ื”ืฉื &quot;ืงื˜ืข&quot;</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text bytes="55" xml:space="preserve">#ื”ืคื ื™ื” [[ืžืฆื•ื“ื•ืช ืขืœ ืžืœื›ื™ื ื‘ ื ื™ื’]]</text>
      <sha1>q2a2beryjv8vohgo2nxriz7hdg3cxpn</sha1>
    </revision>
  </page>

The revision id from the second page is what's used above.

<ns>100</ns>

That's the problem. That namespace doesn't exist, and hasn't for two years. T298430#7608752 sais that all pages have been moved away from the namespace, but apparently, that's not true.

The fix is to manually change the namespace ID of the page to a different namespace.

We should catch this situation waaayyy earlier, and avoid generating jobs for bad pages.

What does grep '<ns>100</ns>' | wc -l look like? Curious to know how big of a cleanup this is.

> grep "<ns>100</ns>" hewikisource-20240120-pages-meta-current.xml | wc -l
   42922

Change 994360 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Assert invariant when constructing a Title from a LinkTarget

https://gerrit.wikimedia.org/r/994360

> grep "<ns>100</ns>" hewikisource-20240120-pages-meta-current.xml | wc -l
   42922

The database agrees:

MariaDB [hewikisource_p]> select count(*) from page where page_namespace = 100;
+----------+
| count(*) |
+----------+
|    42922 |
+----------+

They are all redirects, though:

MariaDB [hewikisource_p]> select count(*) from page where page_namespace = 100 and page_is_redirect=0;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

Change 994360 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Assert invariant when constructing a Title from a LinkTarget

https://gerrit.wikimedia.org/r/994360

I'm a sysop in the hewikisource. If this makes it easier, you can restore this namespace, and I will run a bot to delete all these pages. I prefer if you can delete them from the database itself instead, but it can be an option.
To prevent the next possible issue, I suggest to delete automatically every page while namespace is deleted (actually we thought that this is the status already, that's why we haven't deleted these pages before), or at least to ensure that all the pages in the namespace have been deleted.

Change 997591 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a16

https://gerrit.wikimedia.org/r/997591

Change 997591 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a16

https://gerrit.wikimedia.org/r/997591