Page MenuHomePhabricator

PHP Notice: Undefined index: format
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   PHP Notice: Undefined index: format
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.21/includes/language/Message.php(296)
#0 /srv/mediawiki/php-1.37.0-wmf.21/includes/language/Message.php(296): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 [internal function]: Message->unserialize(string)
#2 /srv/mediawiki/php-1.37.0-wmf.21/includes/libs/objectcache/MemcachedPeclBagOStuff.php(342): Memcached->getMulti(array)
#3 /srv/mediawiki/php-1.37.0-wmf.21/includes/libs/objectcache/MediumSpecificBagOStuff.php(626): MemcachedPeclBagOStuff->doGetMulti(array, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.21/includes/libs/objectcache/wancache/WANObjectCache.php(582): MediumSpecificBagOStuff->getMulti(array)
#5 /srv/mediawiki/php-1.37.0-wmf.21/includes/libs/objectcache/wancache/WANObjectCache.php(1594): WANObjectCache->fetchKeys(array, array)
#6 /srv/mediawiki/php-1.37.0-wmf.21/includes/libs/objectcache/wancache/WANObjectCache.php(1541): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#7 /srv/mediawiki/php-1.37.0-wmf.21/includes/specialpage/ChangesListSpecialPage.php(945): WANObjectCache->getWithSetCallback(string, integer, Closure)
#8 /srv/mediawiki/php-1.37.0-wmf.21/includes/specialpage/ChangesListSpecialPage.php(963): ChangesListSpecialPage::getChangeTagListSummary(ResourceLoaderContext)
#9 /srv/mediawiki/php-1.37.0-wmf.21/includes/specialpage/ChangesListSpecialPage.php(878): ChangesListSpecialPage::getChangeTagList(ResourceLoaderContext)
#10 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderFileModule.php(1387): ChangesListSpecialPage::getRcFiltersConfigVars(ResourceLoaderContext, GlobalVarConfig, NULL)
#11 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderFileModule.php(365): ResourceLoaderFileModule->getPackageFiles(ResourceLoaderContext)
#12 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderModule.php(765): ResourceLoaderFileModule->getScript(ResourceLoaderContext)
#13 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderModule.php(733): ResourceLoaderModule->buildContent(ResourceLoaderContext)
#14 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoader.php(1130): ResourceLoaderModule->getModuleContent(ResourceLoaderContext)
#15 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoader.php(838): ResourceLoader->makeModuleResponse(ResourceLoaderContext, array, array)
#16 /srv/mediawiki/php-1.37.0-wmf.21/load.php(51): ResourceLoader->respond(ResourceLoaderContext)
#17 /srv/mediawiki/php-1.37.0-wmf.21/load.php(38): wfLoadMain()
#18 /srv/mediawiki/w/load.php(3): require(string)
#19 {main}
Impact
Notes

25000 occurrences immediately after rolling 1.37.0-wmf.23 to all wikis. I am guessing some object got saved without format which is not recognized by the 1.37.0-wmf.21 version of the code?

Maybe a race condition during deployment, but it seems some object is stored in cache and is not back compatible.

Details

Request URL
https://en.wikipedia.org/w/load.php?lang=*&modules=*&skin=*&version=*

Event Timeline

hashar triaged this task as Unbreak Now! priority.Sep 15 2021, 7:17 PM
hashar created this task.

Less occurrences but looks related:

Call to a member function parse() on boolean

from /srv/mediawiki/php-1.37.0-wmf.21/includes/specialpage/ChangesListSpecialPage.php(967)
#0 /srv/mediawiki/php-1.37.0-wmf.21/includes/specialpage/ChangesListSpecialPage.php(878): ChangesListSpecialPage::getChangeTagList(ResourceLoaderContext)
#1 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderFileModule.php(1387): ChangesListSpecialPage::getRcFiltersConfigVars(ResourceLoaderContext, GlobalVarConfig, NULL)
#2 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderFileModule.php(365): ResourceLoaderFileModule->getPackageFiles(ResourceLoaderContext)
#3 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderModule.php(765): ResourceLoaderFileModule->getScript(ResourceLoaderContext)
#4 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoaderModule.php(733): ResourceLoaderModule->buildContent(ResourceLoaderContext)
#5 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoader.php(1130): ResourceLoaderModule->getModuleContent(ResourceLoaderContext)
#6 /srv/mediawiki/php-1.37.0-wmf.21/includes/resourceloader/ResourceLoader.php(838): ResourceLoader->makeModuleResponse(ResourceLoaderContext, array, array)
#7 /srv/mediawiki/php-1.37.0-wmf.21/load.php(51): ResourceLoader->respond(ResourceLoaderContext)
#8 /srv/mediawiki/php-1.37.0-wmf.21/load.php(38): wfLoadMain()
#9 /srv/mediawiki/w/load.php(3): require(string)
#10 {main}

Less occurrences but looks related:

Call to a member function parse() on boolean

Quick guess: the cached value cannot be unserialized, getWithSet returns false, and the callee calls Message methods on it.

Change 721313 had a related patch set uploaded (by Daimona Eaytoy; author: Gergő Tisza):

[mediawiki/core@wmf/1.37.0-wmf.21] Message: Remove deprecated format property

https://gerrit.wikimedia.org/r/721313

So anything which caches Message objects is affected -- meaning there's no single cache key that we can change to work around this issue. And this particular issue happens when the value is cached on .23 (which doesn't have the "format" key) and retrieved on .21 (which has it), so cross-wiki. I see two obvious solutions:

  1. Backport the culprit patch to wmf.21, so all versions of MW running in prod have the same code and there's no incompatibility
  2. Continue setting 'format' in Message::serialize, using some default value

I think 1 would be saner, so I've prepared a backport.

Ugh, sorry for the disruption. I agree the first approach is the way to go.

For future reference, a forward-and-backward compatible approach for something like this might have been to split the patch in master with a trivial part that is future-compat (e.g. stop reading "format" and/or tolerate absence) which can then be trivially backported with minimal testing, and the larger remaining patch that rides the train (e.g. which removes support and stops producing it).

Also, for cases where the code isn't as central like Message.php, you can do it in a single patch by adding a version [ 'version' => ] to the getWithSet call. This is a WANObjectCache feature that ensures the key remains unchanges such that cross-version purging still works for global cache keys, but ensures the values live separately.

We have a task that has a lot of deep technical details about serialization/unserialization from cache issues which I can't find. But the end result is that unserialize() from cache should no more be used: T161647 . That is merely for the context.

The proposed fix is to remove the format property in wmf.21 code by backporting the cache. That would let wmf.21 unserialize objects that got cached by wmf.23. Once deployed, my understanding is the cache will still have objects serialized by the old wmf.21 code and thus contain the format property, they would thus fail to unserialize leading to the same issue. Or am I misunderstanding?

Once deployed, my understanding is the cache will still have objects serialized by the old wmf.21 code and thus contain the format property, they would thus fail to unserialize leading to the same issue. Or am I misunderstanding?

I have not check the code in detail, but generally, an extra key does not cause issues. Only a missing key does.

Different question: do I understand correctly that getWithSet() returns false of a cache entry was found but failed to serialize? Shouldn't it just treat this the same as a cache miss and generate the value using the callback?

Summary from a conversation I had with @daniel and @ArielGlenn , I asked for clarification about how we can transition correctly when we run two versions of a class in production but a single cache for both. Copy pasting their answers with some formatting:


First train roll: the new code must be able to read and write the old and new formats but writes of the new format are not enabled; then we wait for the old version of mw no longer to be deployed (i.e. train rolls successfully).
Next train, the new code is enabled to write the new format and we wait for the old cache entries to disappear.
When the last of those is gone, we can remove the code that reads/writes the old format, on the third train roll, and the transition is done.

Serialization changes have to be deployed in three phases: the first patch introduces forward compatibility, the second patch startes writing the new format but retains beackwards compatibility, and the third patch removes backwards compatibility. The third patch has to wait until all old cache entries have expired, which may be a month or even longer in some cases (or never, if we are talking about things like HistoryBlob).

Note that the issue can happen with cross-wiki reads, but also in case of rollbacks: if the train gets rolled back, old code is reading new data. If it'S not forward compatible, things explode.


Which is super clear, thank you!

Change 721313 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.21] Message: Remove deprecated format property

https://gerrit.wikimedia.org/r/721313

Mentioned in SAL (#wikimedia-operations) [2021-09-16T10:59:34Z] <hashar@deploy1002> Synchronized php-1.37.0-wmf.21/includes/language/Message.php: Message: Remove deprecated format property - T146416 T291124 (duration: 01m 06s)

hashar assigned this task to Tgr.

Marking resolved since all patches got deployed. Will reopen if that cause troubles when promoting group 1 in one hour from now.