Page MenuHomePhabricator

"PHP Notice: Undefined index: key" and similar in Cite.php and ReferenceStack.php
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

MediaWiki version: 1.35.0-wmf.8

message
PHP Notice: Undefined index: key

Notes

A large spike of these and similar for indices key and number (in Cite.php) and key, count, and text in ReferenceStack.php, after deployment of 1.35.0-wmf.8 to all wikis.

Also some for values that appear to be coming from the query. From logspam-watch:

2  ErrorException  (/srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/ReferenceStack.php:185) PHP Notice: Undefined index: itis
2  ErrorException  (/srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/ReferenceStack.php:185) PHP Notice: Undefined index: brr
1  ErrorException  (/srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/ReferenceStack.php:185) PHP Notice: Undefined index: RIAA
1  ErrorException  (/srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/ReferenceStack.php:185) PHP Notice: Undefined index: Marantz-31-10-16
1  ErrorException  (/srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/ReferenceStack.php:185) PHP Notice: Undefined index: belgian

Details

Request ID
Xe6LzApAEDUAAJKI0mQAAABF
Request URL
https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text|displaytitle&section=0&page=Rohingya_genocide&origin=*
Stack Trace
exception.trace
#0 /srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/Cite.php(635): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/Cite.php(548): Cite\Cite->referencesFormatEntry(string, array)
#2 /srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/Cite.php(929): Cite\Cite->referencesFormat(string, boolean)
#3 /srv/mediawiki/php-1.35.0-wmf.8/extensions/Cite/src/Hooks/CiteParserHooks.php(60): Cite\Cite->checkRefsNoReferences(boolean, ParserOptions, ParserOutput, string)
#4 /srv/mediawiki/php-1.35.0-wmf.8/includes/Hooks.php(174): Cite\Hooks\CiteParserHooks::onParserAfterParse(Parser, string, StripState)
#5 /srv/mediawiki/php-1.35.0-wmf.8/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#6 /srv/mediawiki/php-1.35.0-wmf.8/includes/parser/Parser.php(586): Hooks::run(string, array)
#7 /srv/mediawiki/php-1.35.0-wmf.8/includes/content/WikitextContent.php(368): Parser->parse(string, Title, ParserOptions, boolean, boolean, NULL)
#8 /srv/mediawiki/php-1.35.0-wmf.8/includes/content/AbstractContent.php(555): WikitextContent->fillParserOutput(Title, NULL, ParserOptions, boolean, ParserOutput)
#9 /srv/mediawiki/php-1.35.0-wmf.8/includes/api/ApiParse.php(591): AbstractContent->getParserOutput(Title, NULL, ParserOptions)
#10 /srv/mediawiki/php-1.35.0-wmf.8/includes/api/ApiParse.php(164): ApiParse->getParsedContent(WikiPage, ParserOptions, boolean, NULL, NULL, boolean)
#11 /srv/mediawiki/php-1.35.0-wmf.8/includes/api/ApiMain.php(1603): ApiParse->execute()
#12 /srv/mediawiki/php-1.35.0-wmf.8/includes/api/ApiMain.php(539): ApiMain->executeAction()
#13 /srv/mediawiki/php-1.35.0-wmf.8/includes/api/ApiMain.php(510): ApiMain->executeActionWithErrorHandling()
#14 /srv/mediawiki/php-1.35.0-wmf.8/api.php(83): ApiMain->execute()
#15 /srv/mediawiki/w/api.php(3): require(string)
#16 {main}

Event Timeline

brennen created this task.Dec 9 2019, 6:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 9 2019, 6:10 PM
brennen triaged this task as Unbreak Now! priority.Dec 9 2019, 6:10 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptDec 9 2019, 6:10 PM
brennen updated the task description. (Show Details)Dec 9 2019, 6:12 PM
Daimona added a subscriber: Daimona.Dec 9 2019, 6:35 PM
awight claimed this task.EditedDec 9 2019, 9:36 PM

There are evidently illegal array accesses coming from several places in the code on production, however I still haven't been able to reproduce any of the errors on my local machine. I've imported the erroring pages, for example [[en:Rohingya genocide]], and enabled all extensions, with no success.

Without knowing exactly what the problem is, I can only treat the symptoms. At each of the illegal array accesses, I'll add a null coalesce ??, or guard with isset. This is a bandage over the situation, and may be leaving some page output incorrect, but should be enough to unblock the train.

Another alternative would be to take the 1.35.0-wmf.5 code for the Cite extension, there have been no breaking changes so it's perfectly safe to roll back.

(wmf.5, not wmf.7; this is the first train for a month.)

Change 556066 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Hotfix: Defensive array accesses

https://gerrit.wikimedia.org/r/556066

Change 556066 merged by jenkins-bot:
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Hotfix: Defensive array accesses

https://gerrit.wikimedia.org/r/556066

Mentioned in SAL (#wikimedia-operations) [2019-12-09T23:00:17Z] <brennen@deploy1001> Synchronized php-1.35.0-wmf.8/extensions/Cite: Sync [[gerrit:556066|Hotfix: Defensive array accesses (T240248)]] (duration: 00m 57s)

Change 556080 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Hotfix: part 2 of the array access adventure

https://gerrit.wikimedia.org/r/556080

Change 556080 merged by jenkins-bot:
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Hotfix: part 2 of the array access adventure

https://gerrit.wikimedia.org/r/556080

Not fully fixed yet. :-(

@thiemowmde @WMDE-Fisch This has been exciting so far—some of these array access errors should be impossible, so we need to think about "improbable" culprits such as multithreading, reference counting, etc. Or just basic PHP things which I can't see.

Even more exciting is the next train deployment, *later today* at 20:00 UTC. We need to do our best to make the current Cite master branch robust and quiet. We should also prepare a revert patch to take Cite all the way back to wmf.5, in case this becomes necessary.

Reedy added a subscriber: Reedy.Dec 10 2019, 1:15 AM

https://en.wikipedia.org/wiki/Talislanta_Sorcerer%27s_Guide

This one is showing as erroring, but it's defined and not redefined... Does cite now complain if it's not defined it the first usage?

Here's a test request which has been able to trigger one of the remaining two errors, on mwdebug:
https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text|displaytitle&section=0&page=Rohingya_genocide&origin=*

Of course, the article has been fixed so now we'll need a specific revision.

Change 556153 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/Cite@master] Reduce risk of sharing storage across parsers

https://gerrit.wikimedia.org/r/556153

Change 556186 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Fix incomplete cloning of the Parser::$extCite instance

https://gerrit.wikimedia.org/r/556186

Change 556193 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Cite@master] Fix incomplete cloning of the Parser::$extCite instance

https://gerrit.wikimedia.org/r/556193

Change 556186 merged by jenkins-bot:
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Fix incomplete cloning of the Parser::$extCite instance

https://gerrit.wikimedia.org/r/556186

Change 556193 merged by jenkins-bot:
[mediawiki/extensions/Cite@master] Fix incomplete cloning of the Parser::$extCite instance

https://gerrit.wikimedia.org/r/556193

Change 556209 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/Cite@master] Integration test to hit cloned Cite bug

https://gerrit.wikimedia.org/r/556209

What we have done so far:

  • Some of the errors are from a parse API call, some aren't. It appears the way the parser is called is unrelated.
  • Some of the stack traces I have seen happen when an article uses {{#tag:references|…}}. This triggers the rollbackRefs() and $redoStack code, undoing and redoing some previous actions. We suspected this might mess with the ReferenceStack::$refs array in a way that's not clean.
  • There is a foreach loop in referencesFormat(), rendering the <ol> list of all individual <ref>s in a group. This code is using a copy of the values from ReferenceStack::$refs. Is it possible this code is using a conflicting copy and the original ReferenceStack::$refs the same time?
  • We suspected PHPs garbage collection to mess ReferenceStack::$refs up.
  • All the wiki pages I have seen so far contain the Cite error about a <ref> being declared twice, with conflicting content (it says "name … defined multiple times with different content"). It turns out this was indeed relevant, even if the issue might not be strictly limited to this code.
  • We suspected the ParserClearState and/or ParserCloned hook handlers being involved. Both intentionally mess with the Cite classes internal state.
  • Some array index errors happen in ReferenceStack::pushRef() in a place that looks like it is protected by an isset( $this->refs[$group][$name] ). With this check it should be impossible to run into array index errors, except the code between the isset() and the error is secretly modifying $this->refs. The only calls we can see are StripState::unstripBoth(), which looks trivial, and Parser::addTrackingCategory(), which does a suspicious Message call.

The last two happened to be the key. Under specific circumstances (involving the MessageCache) the Message class is cloning the Parser, which triggers cloning the Cite object, but does not clone the ReferenceStack object that's part of Cite. The cloned Cite cleans it's own ReferenceStack, which also cleans the original, which then start accessing unknown array elements in the code following addTrackingCategory().

An almost complete explanation of what actually happened is here: https://gerrit.wikimedia.org/r/556186

A test case that triggers the broken code path is now here: https://gerrit.wikimedia.org/r/556209

Mentioned in SAL (#wikimedia-operations) [2019-12-10T16:31:42Z] <andrew-wmde@deploy1001> Synchronized php-1.35.0-wmf.8/extensions/Cite: SWAT: [[gerrit:556186|Fix incomplete cloning of the Parser::$extCite instance (T240248)]] (duration: 01m 04s)

Change 556218 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Catch one last undefined index

https://gerrit.wikimedia.org/r/556218

Change 556218 merged by jenkins-bot:
[mediawiki/extensions/Cite@wmf/1.35.0-wmf.8] Catch one last undefined index

https://gerrit.wikimedia.org/r/556218

Mentioned in SAL (#wikimedia-operations) [2019-12-10T17:22:22Z] <andrew-wmde@deploy1001> Synchronized php-1.35.0-wmf.8/extensions/Cite: SWAT: [[gerrit:556218|Catch one last undefined index (T240248)]] (duration: 01m 02s)

Jdforrester-WMF closed this task as Resolved.Dec 10 2019, 6:18 PM

Provisionally declaring this fixed.

Change 556209 merged by jenkins-bot:
[mediawiki/extensions/Cite@master] Integration test to hit cloned Cite bug

https://gerrit.wikimedia.org/r/556209

TheDJ added a subscriber: TheDJ.Dec 10 2019, 8:41 PM

That was an awesome explanation @thiemowmde ! I guess this whole thing showed how complex Cite truly is.
Thanks to everyone working on and dealing with this bug.

Change 559938 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Cite@master] Replace now unused native cloning feature

https://gerrit.wikimedia.org/r/559938

Change 559938 merged by jenkins-bot:
[mediawiki/extensions/Cite@master] Replace now unused native cloning feature

https://gerrit.wikimedia.org/r/559938