Page MenuHomePhabricator

Scattering of PHP Warning: Memcached::setMulti(): failed to set key global:segment:...
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error message
[Exception ErrorException] (/srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php:350) PHP Warning: Memcached::setMulti(): failed to set key global:segment:enwiki%3Apcache%3Aidhash%3A8618-0!canonical!responsiveimages=0:93c0af008814066ad53190b968fe5f52450ea1ce

I'm again seeing a steady trickle of similar warnings to those from T247562.

Impact

Unclear, but presumably low at present, apart from noise in logspam-watch:

brennen@mwlog1001:/srv/mw-log$ grep -c 'PHP Warning: Memcached::setMulti(): failed to set key global:segment' ./error.log
74

Perhaps this is a normal level of noise, but raising here since it seemed to be absent for a while.

Se also: T247562: Warning: Memcached::setMulti(): failed to set key global:segment:...

Details

Request URL
https://en.wikipedia.org/wiki/David_Beckham
Stack Trace
[Exception ErrorException] (/srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php:350) PHP Warning: Memcached::setMulti(): failed to set key global:segment:enwiki%3Apcache%3Aidhash%3A8618-0!canonical!responsiveimages=0:93c0af008814066ad53190b968fe5f52450ea1ce
  #0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
  #1 /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MemcachedPeclBagOStuff.php(350): Memcached->setMulti(array, integer)
  #2 /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(611): MemcachedPeclBagOStuff->doSetMulti(array, integer, integer)
  #3 /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(808): MediumSpecificBagOStuff->setMulti(array, integer, integer)
  #4 /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(169): MediumSpecificBagOStuff->makeValueOrSegmentList(string, ParserOutput, integer, integer)
  #5 /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MultiWriteBagOStuff.php(327): MediumSpecificBagOStuff->set(string, ParserOutput, integer, integer)
  #6 /srv/mediawiki/php-1.35.0-wmf.28/includes/libs/objectcache/MultiWriteBagOStuff.php(150): MultiWriteBagOStuff->doWrite(array, boolean, string, array)
  #7 /srv/mediawiki/php-1.35.0-wmf.28/includes/parser/ParserCache.php(355): MultiWriteBagOStuff->set(string, ParserOutput, integer, integer)
  #8 /srv/mediawiki/php-1.35.0-wmf.28/includes/poolcounter/PoolWorkArticleView.php(213): ParserCache->save(ParserOutput, WikiPage, ParserOptions, string, integer)
  #9 /srv/mediawiki/php-1.35.0-wmf.28/includes/poolcounter/PoolCounterWork.php(125): PoolWorkArticleView->doWork()
  #10 /srv/mediawiki/php-1.35.0-wmf.28/includes/page/Article.php(787): PoolCounterWork->execute()
  #11 /srv/mediawiki/php-1.35.0-wmf.28/includes/actions/ViewAction.php(66): Article->view()
  #12 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(519): ViewAction->show()
  #13 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(305): MediaWiki->performAction(Article, Title)
  #14 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(973): MediaWiki->performRequest()
  #15 /srv/mediawiki/php-1.35.0-wmf.28/includes/MediaWiki.php(535): MediaWiki->main()
  #16 /srv/mediawiki/php-1.35.0-wmf.28/index.php(47): MediaWiki->run()
  #17 /srv/mediawiki/w/index.php(3): require(string)
  #18 {main}

Event Timeline

Not sure if there is anything for CPT to do here. May be a problem in the driver, or the service.

The problem is that these errors are reported to Logstash and affecting our ability to deploy code and monitor production.

The specific way in which these errors are being reported has not been seen before March 2020, it started then (T247562).

In general Memcached issues are tolerated and indeed don't matter. That is why MW has always reported them only to an information channel for debugging purposes.

This task is to confirm that these new ones that are appearing in the PHP error channel are indeed also fine, and then to make them stop being reported to the PHP error channel. Ideally knowing why it started happening as that might provide a clean way for making it stop as well.

eprodromou subscribed.

This seems more like a Performance issue than a CPT issue. Let us know if there's anything else we can help with.

@eprodromou what's the performance issue here? That's a PHP error.

@eprodromou what's the performance issue here? That's a PHP error.

I suppose Evan means the error is in code owned by the Performance team.

Krinkle triaged this task as High priority.EditedJun 23 2020, 6:27 PM

Production error. Needs to be investigated per earlier comments. This is a high-confidence regression that started 2 months ago.

Investigation might show that the backend service is failing worse then before, in which case we need to know that and hand-off to SRE. Or it might show that these are known issues we've previously acknowledge through a low-severity log channel and are now leaking into the primary error monitor due to a bug introduced elsewherre in core, in which case the core wiring needs to be adjusted so that it is filtered out again like it once was.

Context from a single sample with surrounding messages from the same reqId:

  • reqId: 57e91ddc-2460-454a-b178-5969d0a1c768
channelmessage
errorPHP Warning: Memcached::setMulti(): failed to set key global:segment:dewiki%3Astashedit-info-v1:…:
StashEditFailed to cache parser output for key 'dewiki:stashedit-info-v1:…:…'.
memcachedMemcached error: A TIMEOUT OCCURRED
memcachedMemcached error for key "dewiki:stashedit:text:…" on server "127.0.0.1:11213": A TIMEOUT OCCURRED

These are still seen in prod. Looking into this again, and specifically trying to determine why they started showing up in this specific way. From a higher level perspective, it's entirely expected that some memcached fetches fail. And in terms of API contract, these appear the same as a cache miss, and cause essentially no user impact so long as they are at a low frequent and not consistently any particular person, wiki, page, etc. The main issue is that this showing up as a PHP warning is unexpected. It is meant to be quiet, and given that it has been quiet in the past, I'm hestitant to just slap a warning supression on it as that would obscure other warnings that could be helpful to still catch as real regressions (e.g. size failures, serialization failurs, etc.)

We currently run:

Version => 3.0.1
libmemcached version => 1.0.18

Looking at the upstream code, this error comes specifically from the Memcached::setMulti() method (upstream source code). It's not clear to me why the php-memcached handler for ::set(), as backed by php_memc_store_impl, does not issue a warning for the same type of issue, there it just returns false quiet a warning.

The warning was introduced upstream in commit a930b41 and first released in php-memcached 3.0.0, which we've been using in production since well before this task nor task T247562 (specifically, there is a SAL entry from 2019 in which a patch was applied to the already-installed php-memcached 3.0.1 package at that time.

There are a number of upstream support issues that mention this:

As such, it seems most likely that the issue started due to a change in our code. Looking at the sack traces, the ones I found all involved ParserCache keys, and more specifically those accessed via ParserCache->save / MultiWriteBagOStuff->set / MediumSpecificBagOStuff->makeValueOrSegmentList / .. / Memcached->setMulti. Which is an interesting case because it calls setMulti internally for what is on the surface not a batch operation. This is a relatively new feature (2019) invoked via the WRITE_ALLOW_SEGMENTS flag, to mitigate T204742.

So, my understanding is as follows:

  • When any kind of write operation fails, the reason is made available to us via Memcached::getResultMessage() which we report to Logstash under the memcached channel via MemcachedPeclBagOStuff::checkResult.
  • When such failure happens during setMulti, php-memcached issues an additional useless warning. One that appears to have been added upstream by mistake.
  • The blobs we store via ParserCache are probably the most likely to hit such write failure, and since we started using segmented blobs, we started using setMulti to store them.

I recommend we simply suppress warnings on the setMulti call for now. I've mentioned the same on upstream php-memcached #409 and submitted a pull request to remove this.

Change 713569 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] objectcache: Ignore \"failed to set\" warning from php-memcached

https://gerrit.wikimedia.org/r/713569

Change 713569 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Ignore \"failed to set\" warning from php-memcached

https://gerrit.wikimedia.org/r/713569