Page MenuHomePhabricator

MemcachedPeclBagOStuff: Serialization of 'Closure' is not allowed
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error message
Serialization of 'Closure' is not allowed
Stack Trace
from /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php(228)
#0 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php(228): Memcached->cas(integer, string, array, integer)
#1 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(355): MemcachedPeclBagOStuff->doCas(integer, string, array, integer, integer)
#2 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(316): MediumSpecificBagOStuff->cas(integer, string, array, integer, integer)
#3 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(268): MediumSpecificBagOStuff->mergeViaCas(string, Closure, integer, integer, integer)
#4 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/wancache/WANObjectCache.php(781): MediumSpecificBagOStuff->merge(string, Closure, integer, integer)
#5 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/wancache/WANObjectCache.php(1600): WANObjectCache->set(string, array, integer, array)
#6 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/wancache/WANObjectCache.php(2570): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#7 /srv/mediawiki/php-1.36.0-wmf.28/includes/deferred/MWCallableUpdate.php(38): WANObjectCache->{closure}()
#8 /srv/mediawiki/php-1.36.0-wmf.28/includes/deferred/DeferredUpdates.php(513): MWCallableUpdate->doUpdate()
#9 /srv/mediawiki/php-1.36.0-wmf.28/includes/deferred/DeferredUpdates.php(390): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#10 /srv/mediawiki/php-1.36.0-wmf.28/includes/deferred/DeferredUpdates.php(221): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#11 /srv/mediawiki/php-1.36.0-wmf.28/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(MWCallableUpdate, integer)
#12 /srv/mediawiki/php-1.36.0-wmf.28/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
#13 /srv/mediawiki/php-1.36.0-wmf.28/includes/deferred/DeferredUpdates.php(242): DeferredUpdatesScope->processUpdates(integer, Closure)
#14 /srv/mediawiki/php-1.36.0-wmf.28/includes/MediaWiki.php(1117): DeferredUpdates::doUpdates(string)
#15 /srv/mediawiki/php-1.36.0-wmf.28/includes/MediaWiki.php(853): MediaWiki->restInPeace()
#16 /srv/mediawiki/php-1.36.0-wmf.28/includes/MediaWiki.php(865): MediaWiki->{closure}()
#17 /srv/mediawiki/php-1.36.0-wmf.28/includes/MediaWiki.php(587): MediaWiki->doPostOutputShutdown()
#18 /srv/mediawiki/php-1.36.0-wmf.28/index.php(53): MediaWiki->run()
#19 /srv/mediawiki/php-1.36.0-wmf.28/index.php(46): wfIndexMain()
#20 /srv/mediawiki/w/index.php(3): require(string)
#21 {main}
Impact

Unclear.

Roughly 45 of these seen since deploy of 1.36.0-wmf.28 (T271342) to all wikis.

  1. Notes

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
brennen added a project: User-brennen.
brennen moved this task from Backlog to Logs/Train on the User-brennen board.
dancy triaged this task as Unbreak Now! priority.Feb 1 2021, 4:44 PM

I doubt this has anything to do with FeaturedFeeds, the bug is in whatever DeferredUpdate is being triggered.

I narrowed down the update being scheduled to https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/libs/objectcache/wancache/WANObjectCache.php#2568

In the last line:

#0 /srv/mediawiki/php-1.36.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php(228): Memcached->cas(integer, string, array, integer)

The exception is suggesting that the array has a Closure in it. So some WANObjectCache getWithSetCallback is accidentally returning an array that has a Closure in it...it would be helpful to have additional debug logging giving us more hints as to what is trying to cache this since we're losing the stack trace because of the DeferredUpdate.

In T273242#6795370, @Majavah wrote:

I think that verbose logging can be enabled in the WikimediaDebug browser extension, and then those entries would be visible in logstash among other places. https://wikitech.wikimedia.org/wiki/WikimediaDebug#Debug_logging

I think that verbose logging can be enabled in the WikimediaDebug browser extension, and then those entries would be visible in logstash among other places. https://wikitech.wikimedia.org/wiki/WikimediaDebug#Debug_logging

Martin and myself just generated over 30MB of debug logs after bumping a wiki to .28 on mwdebug1003, no hits for this error. We suspect that it would need some user traffic on .28 to insert those broken updates.

I have looked at the log again this morning, filtering based on wiki being commonswiki or enwiki. The URLS apparently all look like /wiki/Special:FeedItem/motd/20210122000000/en or /w/api.php?action=featuredfeed.

On https://logstash.wikimedia.org/app/dashboards#/view/mediawiki-errors

I looked at reqId:"YBM2TwpAIEIAAK@v-gEAAAAI" which was for commonswiki /wiki/Special:FeedItem/potd/20210127000000/en

Jan 28, 2021 @ 22:10:25.686Serialization of 'Closure' is not allowed
Jan 28, 2021 @ 22:10:25.687Deferred update 'MWCallableUpdate_WANObjectCache->scheduleAsyncRefresh' failed to run

Which does not gives much information even when removing all the filters there is nothing else shown.

I could not find out how to retrieve feeds on Beta-Cluster-Infrastructure . The extension is enabled but I could not find any of the feeds :/

@hashar I imported a few missing pages to beta enwiki for the enwiki featured feed to work, see https://en.wikipedia.beta.wmflabs.org/wiki/Special:FeedItem/featured/20210202000000/en for example. It has pages from Jan 28 this year, I can import more if needed

Change 661357 had a related patch set uploaded (by Majavah; owner: Majavah):
[mediawiki/extensions/FeaturedFeeds@master] Caching fixes

https://gerrit.wikimedia.org/r/661357

I wanted to reproduce the issue on beta, I guess it is possible now thank you!

It would be nice to find why the Serialization issue only appeared now.

Change 661673 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: improve async cache key refresh exception logging

https://gerrit.wikimedia.org/r/661673

In T273242#6799594, @Majavah wrote:

@hashar I imported a few missing pages to beta enwiki for the enwiki featured feed to work, see https://en.wikipedia.beta.wmflabs.org/wiki/Special:FeedItem/featured/20210202000000/en for example. It has pages from Jan 28 this year, I can import more if needed

I've tried the analogous feed link in deployment-prep as a logged-in user for Jan 28, 29, Feb 1, 2, 3 and no errors. And one as a logged-out user for good measure. No errors. What else might we try?

Created https://test.wikipedia.org/wiki/MediaWiki:Ffeed-featured-page with:

Wikipedia:Today's featured article/{{#time:F j, Y}}

Then some page: https://test.wikipedia.org/wiki/Wikipedia:Today%27s_featured_article/February_5,_2021 :

Dummy featured article for February 5, 2021. To assist with [https://phabricator.wikimedia.org/T273242 T273242].

Should show up on:
https://test.wikipedia.org/wiki/Special:FeedItem/featured/20210205000000/

https://test.wikipedia.org/wiki/Special:FeedItem/featured/20210205000000/en (albeit it shows a previous edit of the feature article, or maybe it is made to only show the first paragraph).

Browsed that with mwdebug1001 + verbose log enabled. No error shows up :-\

Change 661962 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: make WANObjectCache check for values with fragile serializability

https://gerrit.wikimedia.org/r/661962

In general, DeferredUpdates is not great at error logging. One possible improvement would be to store an exception object when an update is deferred, and chain it when the update throws an exception. That has some performance impact, but maybe not much. The other would be to catch exceptions from callable updates and use reflection to determine where the callback comes from.

A train blocker should not stay like this, danger and cost of halting a train is considerable (not to mention halting two). Please revert anything that has been causing this and do the proper fix later.

Backport deployed for improved logging on wmf.29 (currently on test.wikipedia.org): https://gerrit.wikimedia.org/r/c/mediawiki/core/+/662065

If we don't get any instances of the error over there, we might consider rolling .29 out to a bigger wiki.

Change 662714 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] WANObjectCache: warn on non-JSONic values.

https://gerrit.wikimedia.org/r/662714

Change 662724 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] WANObjectCache: throw on Closure

https://gerrit.wikimedia.org/r/662724

OK here are my entirely opinionated thoughts about how to move this ticket along.

  • We don't so far know how to reproduce this bug, and several of us have tried.
  • The error doesn't seem to "just show up" either in beta or on testwiki.
  • We need more logging even if we could reproduce it.
  • One logging changeset has been merged and backported to wmf.29, another is waiting in the wings ( https://gerrit.wikimedia.org/r/c/mediawiki/core/+/662724 )

So:

  • let's clean up the second patch and get it backported to .29
  • give another few hours of people trying to reproduce the dang bug by hand
  • give up and schedule to roll out .29 to the smallest of the wikis which had the error during the time .29 was live elsewhere than testwiki, testwikidatawiki
  • watch like a hawk for errors, as soon as we see one with some logging, roll .29 back and start investigating code paths

Sure we could tweak and merge https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FeaturedFeeds/+/661357 and hope, but we'll only have negative evidence from that ("after several days we do/do not see the errors). I'd rather understand what's wrong first and then fix.

The fact that FeaturedFeeds is caching a User object is a known bug (T264391, T264363). User can contain WebRequest, WebResponse, DatabaseBlock, GlobalBlock, CommentStoreComment, Message and UserAuthority. UserAuthority is an interesting recent addition but is probably not implicated because it unconditionally throws an exception if you try to serialize it:

> $user = new User;

> $user->isAllowed('edit');

> print serialize($user);
Caught exception LogicException: Instances of MediaWiki\HookContainer\HookContainer are not serializable!
#0 [internal function]: MediaWiki\HookContainer\HookContainer->__sleep()
#1 /srv/mw/core/maintenance/eval.php(81) : eval()'d code(1): serialize()
#2 /srv/mw/core/maintenance/eval.php(81): eval()
#3 {main}

I don't know if I care enough to analyse in detail such an obviously broken and temporary situation. So my preference is to merge the FeaturedFeeds patch and hope.

Change 661357 merged by jenkins-bot:
[mediawiki/extensions/FeaturedFeeds@master] Caching fixes

https://gerrit.wikimedia.org/r/661357

Change 662677 had a related patch set uploaded (by Tim Starling; owner: Majavah):
[mediawiki/extensions/FeaturedFeeds@wmf/1.36.0-wmf.29] Caching fixes

https://gerrit.wikimedia.org/r/662677

Change 662677 merged by jenkins-bot:
[mediawiki/extensions/FeaturedFeeds@wmf/1.36.0-wmf.29] Caching fixes

https://gerrit.wikimedia.org/r/662677

Mentioned in SAL (#wikimedia-operations) [2021-02-09T01:56:55Z] <tstarling@deploy1001> Synchronized php-1.36.0-wmf.29/extensions/FeaturedFeeds: probable fix for UBN T273242 (duration: 01m 06s)

I will promote 1.36.0-wmf.29 to group 0/group 1 this afternoon at 14:00 UTC (15:00 CET).

Change 662724 merged by jenkins-bot:
[mediawiki/core@master] WANObjectCache: throw on Closure

https://gerrit.wikimedia.org/r/662724

Change 662953 had a related patch set uploaded (by ArielGlenn; owner: Daniel Kinzler):
[mediawiki/core@wmf/1.36.0-wmf.29] WANObjectCache: throw on Closure

https://gerrit.wikimedia.org/r/662953

Change 662953 merged by jenkins-bot:
[mediawiki/core@wmf/1.36.0-wmf.29] WANObjectCache: throw on Closure

https://gerrit.wikimedia.org/r/662953

Mentioned in SAL (#wikimedia-operations) [2021-02-09T14:10:28Z] <hashar@deploy1001> Synchronized php-1.36.0-wmf.29/includes/libs/objectcache/wancache/WANObjectCache.php: WANObjectCache: throw on Closure - T273242 (duration: 01m 08s)

Change 662959 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] objectache: Revert "throw on Closure" in WANObjectCache

https://gerrit.wikimedia.org/r/662959

Change 662959 merged by jenkins-bot:
[mediawiki/core@master] objectache: Revert "throw on Closure" in WANObjectCache

https://gerrit.wikimedia.org/r/662959

Change 662964 had a related patch set uploaded (by Hashar; owner: Majavah):
[mediawiki/extensions/FeaturedFeeds@wmf/1.36.0-wmf.30] Revert "Caching fixes"

https://gerrit.wikimedia.org/r/662964

reqID: YCK-uApAICkAAHoN53IAAAAE
message: Encountered unserializable value for frwiki:featured-feeds:1:fr: Serialization of 'Closure' is not allowed
exception url: 	/wiki/Sp%C3%A9cial:FeedItem/featured/20210208000000/fr

so now what can we get out of this?

cache key is db-name:featured-feeds:cache-version:language

Mahavah also said on irc this is the caching of the FeaturedFeedsChannel object, see above comment for the format of the key

My suggestion: If fixing these are complicated, just disable featuredfeeds extension and roll forward.

Stance for now is we are sticking to wmf.29 and would backport FeatureFeeds patches as they arrive. There doesn't seem to be any user facing or backend issue so we can probably afford to keep those around until they get properly fixed.

Change 662964 merged by jenkins-bot:
[mediawiki/extensions/FeaturedFeeds@wmf/1.36.0-wmf.30] Revert "Caching fixes"

https://gerrit.wikimedia.org/r/662964

So I rolled frwiki to wmf.30 in mwdebug1002 (and leave it like that until the next scap pull) and tested the given url above (/wiki/Sp%C3%A9cial:FeedItem/featured/20210208000000/fr) and it works just fine. so it seems it finally got fixed.

Yes, I confirm that no new errors have been logged since the rollout of .30 to frwiki last night. https://logstash.wikimedia.org/goto/4a87cc20fc288e85feb1f3f31539dadb

Yes, I confirm that no new errors have been logged since the rollout of .30 to frwiki last night. https://logstash.wikimedia.org/goto/4a87cc20fc288e85feb1f3f31539dadb

frwiki is on .27, isn't it? It's on .30 only on mwdebug1002 which doesn't see regular user traffic

In T273242#6817624, @Majavah wrote:

Yes, I confirm that no new errors have been logged since the rollout of .30 to frwiki last night. https://logstash.wikimedia.org/goto/4a87cc20fc288e85feb1f3f31539dadb

frwiki is on .27, isn't it? It's on .30 only on mwdebug1002 which doesn't see regular user traffic

In that case I retract my comment, I had mistakenly understood that frwiki was on 30 on all appservers.

On mwdebug1002, I ran the following to verify that there was no such key:

echo 'get WANCache:frwiki:featured-feeds:2:fr|#|v' | nc localhost 11213 -q 2 > dumpv2.txt

and checked that the contents of the file were "END". We use v2 because that's the version of the feeds with the new patches in wmf.30
I then visited https://test.wikipedia.org/wiki/Special:FeedItem/featured/20210201000000/en which gives an error about the article not existing. In spite of that, I was able to check that the above key was cached. I also checked the logs on mwlog1001 and found no related errors.

This doesn't test multilingual feeds. I wonder if we should push to mediawikiwiki on mwdebug1002 and test that next.

wmf.30 is on group1, I checked that I could load the mediawiki home page (reported as previously broken, w reqId:YCKk3ApAICkAABTZCOAAAACA from https://www.mediawiki.org/wiki/MediaWiki

Reran the test.wikipdia.org test, still works as it should.

Based on these two things it looks like .30 is safe to roll out to the next group this evening.

So is this task resolved or merely no longer a blocker?

Ladsgroup assigned this task to taavi.

It is resolved technically but kept open just in case it happens again. I resolve it and will reopen if it happens again.

Change 661673 abandoned by Aaron Schulz:
[mediawiki/core@master] objectcache: improve async cache key refresh exception logging

Reason:
duplicate

https://gerrit.wikimedia.org/r/661673