Warning: Memcached::setMulti(): failed to set key global:segment:...
Closed, ResolvedPublicPRODUCTION ERROR
Actions

Assigned To

Authored By

	brennen
	Mar 12 2020, 8:29 PM

Description

Error message

ErrorException from line 340 of /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MemcachedPeclBagOStuff.php: PHP Warning: Memcached::setMulti(): failed to set key global:segment:enwiki%3Apcache%3Aidhash%3A23309859-0!canonical:ce9eb2174b45be4c0a2966f5bfbf5f045e3b6388

Impact

Notes

Quickly got 40+ of these warnings in logspam-watch after rolling 1.35.0-wmf.23 to all wikis, numbers seemed to be increasingly steadily.

Details

Request ID: XmqX4ApAMM0AAKPJ0lgAAAAB
Request URL: https://en.wikipedia.org/w/api.php/w/api.php

Stack Trace

020-03-12 20:14:26 [XmqX4ApAMM0AAKPJ0lgAAAAB] mw1379 enwiki 1.35.0-wmf.23 error ERROR: [XmqX4ApAMM0AAKPJ0lgAAAAB] /w/api.php   ErrorException from line 340 of /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MemcachedPeclBagOStuff.php: PHP Warning: Memcached::setMulti(): failed to set key global:segment:enwiki%3Apcache%3Aidhash%3A23309859-0!canonical:ce9eb2174b45be4c0a2966f5bfbf5f045e3b6388 {"exception_id":"XmqX4ApAMM0AAKPJ0lgAAAAB","exception_url":"/w/api.php","caught_by":"mwe_handler"} 
[Exception ErrorException] (/srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MemcachedPeclBagOStuff.php:340) PHP Warning: Memcached::setMulti(): failed to set key global:segment:enwiki%3Apcache%3Aidhash%3A23309859-0!canonical:ce9eb2174b45be4c0a2966f5bfbf5f045e3b6388
  #0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
  #1 /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MemcachedPeclBagOStuff.php(340): Memcached->setMulti(array, integer)
  #2 /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MediumSpecificBagOStuff.php(608): MemcachedPeclBagOStuff->doSetMulti(array, integer, integer)
  #3 /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MediumSpecificBagOStuff.php(805): MediumSpecificBagOStuff->setMulti(array, integer, integer)
  #4 /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MediumSpecificBagOStuff.php(166): MediumSpecificBagOStuff->makeValueOrSegmentList(string, ParserOutput, integer, integer)
  #5 /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MultiWriteBagOStuff.php(327): MediumSpecificBagOStuff->set(string, ParserOutput, integer, integer)
  #6 /srv/mediawiki/php-1.35.0-wmf.23/includes/libs/objectcache/MultiWriteBagOStuff.php(150): MultiWriteBagOStuff->doWrite(array, boolean, string, array)
  #7 /srv/mediawiki/php-1.35.0-wmf.23/includes/parser/ParserCache.php(354): MultiWriteBagOStuff->set(string, ParserOutput, integer, integer)
  #8 /srv/mediawiki/php-1.35.0-wmf.23/includes/poolcounter/PoolWorkArticleView.php(213): ParserCache->save(ParserOutput, WikiPage, ParserOptions, string, integer)
  #9 /srv/mediawiki/php-1.35.0-wmf.23/includes/poolcounter/PoolCounterWork.php(125): PoolWorkArticleView->doWork()
  #10 /srv/mediawiki/php-1.35.0-wmf.23/includes/page/WikiPage.php(1236): PoolCounterWork->execute()
  #11 /srv/mediawiki/php-1.35.0-wmf.23/includes/api/ApiParse.php(603): WikiPage->getParserOutput(ParserOptions, integer, boolean)
  #12 /srv/mediawiki/php-1.35.0-wmf.23/includes/api/ApiParse.php(166): ApiParse->getParsedContent(WikiPage, ParserOptions, boolean, NULL, NULL, boolean)
  #13 /srv/mediawiki/php-1.35.0-wmf.23/includes/api/ApiMain.php(1590): ApiParse->execute()
  #14 /srv/mediawiki/php-1.35.0-wmf.23/includes/api/ApiMain.php(522): ApiMain->executeAction()
  #15 /srv/mediawiki/php-1.35.0-wmf.23/includes/api/ApiMain.php(493): ApiMain->executeActionWithErrorHandling()
  #16 /srv/mediawiki/php-1.35.0-wmf.23/api.php(84): ApiMain->execute()
  #17 /srv/mediawiki/w/api.php(3): require(string)
  #18 {main}

Subject	Repo	Branch	Lines +/-
objectcache: Restore 'keyspace' for LocalServerCache service	mediawiki/core	wmf/1.35.0-wmf.24	+9 -1
objectcache: Restore 'keyspace' for LocalServerCache service	mediawiki/core	wmf/1.35.0-wmf.23	+9 -1
objectcache: Restore 'keyspace' for LocalServerCache service	mediawiki/core	master	+9 -1

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved	Release	hashar	T233872 1.35.0-wmf.24 deployment blockers
Resolved	Release	brennen	T233871 1.35.0-wmf.23 deployment blockers
Resolved	PRODUCTION ERROR	Krinkle	T247562 Warning: Memcached::setMulti(): failed to set key global:segment:...

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

brennen added a project: User-brennen.Mar 12 2020, 8:29 PM

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMar 12 2020, 8:29 PM

brennen added a parent task: T233871: 1.35.0-wmf.23 deployment blockers.Mar 12 2020, 8:30 PM

It is normal for Memcached to sometimes time out or fail (we prefer fast failure, and tolerate errors there). As such, those are configured to go to their own channel (channel:memcached) that we don't monitor. It returns false to the caller like a cache miss and recovers fine.

This however is the first time I see the php-memcached emit native PHP errors. That shouldn't be the case indeed and is likely a regression.

Krinkle added a project: Platform Engineering.Mar 12 2020, 8:32 PM

brennen moved this task from Backlog to Logs/Train on the User-brennen board.Mar 12 2020, 8:32 PM

Agusbou2015 subscribed.Mar 12 2020, 8:41 PM

• Pchelolo edited projects, added Platform Team Workboards (Clinic Duty Team); removed Platform Engineering.Mar 13 2020, 7:27 AM

• Pchelolo moved this task from Inbox to Doing(WIP:5) on the Platform Team Workboards (Clinic Duty Team) board.

This coincides with a huge spike in memcached SERVER_ERROR and timeouts: https://logstash.wikimedia.org/goto/d611c5c92f87aafbb05b07c7676ed20d

• Pchelolo moved this task from Doing(WIP:5) to Ready (WIP:5) on the Platform Team Workboards (Clinic Duty Team) board.Mar 13 2020, 7:42 AM

• Mholloway subscribed.Mar 13 2020, 6:33 PM

Which of the action APIs is causing this? Is it even a specific one, or all of them?

Krinkle moved this task from Untriaged to Mar 2020 on the Wikimedia-production-error board.Mar 16 2020, 2:59 AM

In T247562#5965661, @Krinkle wrote:

It is normal for Memcached to sometimes time out or fail (we prefer fast failure, and tolerate errors there). As such, those are configured to go to their own channel (channel:memcached) that we don't monitor. It returns false to the caller like a cache miss and recovers fine.

FWIW, that's problematic design and causing problems any time memcached is used for anything other than caching. E.g. it has caused T106066: Don't show "Nonce already used" error on memcache failure IIRC.

@Tgr That's a fundamental contract between MW and Memcached that I don't think we will depart from anytime soon. Non-cache use cases do not belong in Memcached. The use case of T106066 is what session store, echo store, main stash, db-replicated, or dedicated tables are for.

Krinkle mentioned this in T106066: Don't show "Nonce already used" error on memcache failure.Mar 16 2020, 7:02 PM

I'm trying to figure out if this is an actual code change or a memcached server problem with bad timing. Or maybe it's a code change that causes memcached to have a bad time? Does anyone have a theory there?

I see a lot of timeouts from the memcached server at that time. I also see requests not just to the API, but to other endpoints; e.g., wiki/Star?render threw an error during this period.

Also the things I see in the changelog that mention "cache" are...many things: https://www.mediawiki.org/wiki/MediaWiki_1.35/wmf.23/Changelog

Given that this was happening across many code paths I guess I'm looking for a new use that overwhelmed the backend, or some fundamental change in the handling code (which didn't obviously [at least to me] happen in wmf.23).

kostajh subscribed.Mar 16 2020, 8:54 PM

There was a spike in network transmit rate on mc1029 and mc1030 at the time in question. The other servers did not show such a spike. So it may be a small number of keys with large values being frequently requested.

brennen mentioned this in T247788: Parsercache sudden increase of connections.Mar 16 2020, 10:51 PM

I used admin requests to verify that out of 10 selected log entries from 20:10:36, all 10 were associated with mc1030, consistent with saturation of its outbound networking at the time.

Ladsgroup subscribed.Mar 17 2020, 12:54 AM

I tried pulling out the top 100 keys by error rate from that second, but nothing really stood out. So I tried multiplying the request count of those keys by their current size in memcached, but it was still uninspiring.

I noticed that Redis is also running on mc1030. There's no visibility into what nutcracker/Redis is doing. If nutcracker times out while talking to Redis, no error is logged, it silently fails over to the next server. Statistics are incremented, and they seem to show that nutcracker was not dropping mc1030 at the time:

[0300][tstarling@mw1242:~]$ nc localhost 22222 | php -r '$s = stream_get_contents(STDIN); print str_replace("\"\"", "\"", $s);' | jq .redis_eqiad.shard12
{
  "server_eof": 0,
  "server_err": 0,
  "server_timedout": 4,
  "server_connections": 1,
  "server_ejected_at": 1569023964391925,
  "requests": 164938413,
  "request_bytes": 15263903696,
  "responses": 164938409,
  "response_bytes": 19860256544,
  "in_queue": 0,
  "in_queue_bytes": 0,
  "out_queue": 0,
  "out_queue_bytes": 0
}

This output shows that mc1030 was last ejected on 2019-09-20, but maybe it just has a very long timeout. It's hard to say whether Redis was responsible for the network spike when we don't have a traffic breakdown of Redis versus Memcached on mc1030.

We do have detailed statistics for the memcached instance, which shows that the traffic originated from memcached, not redis.

mc1030 mc network.png (285×773 px, 18 KB)

We can even see which slab the key was in:

That's slab 136 with a chunk size of 83KB.

I looked at the logged keys again. The top key from 20:13 by request rate was WANCache:t:arwiki:gadgets-definition:9:2 (by a small margin). This key appeared very rarely in log messages from another day. It's just a timestamp key, but it's fetched using getWithSetCallback() with a check key, which means that the timestamp key and the value key would have been requested together with getMulti(). The value would have been 82658 bytes, consistent with the chunk size above.

~~The remaining mysteries are:~~

Why don't we see log messages of the form "Memcached error: $msg", without the key in the message? If getMulti() is at fault then you would expect MemcachedPeclBagOStuff::doGetMulti() to call checkResult() with $key=false
~~Why would there be a spike in traffic for arwiki:gadgets-definition, when the extension code has not changed?~~
~~Did something go wrong with the anti-stampede and APC local caching code that is supposed to prevent a spike in requests to remote gadget definition cache keys?~~
~~The value key should have been routed to mc1031, so why would there be a spike in 83KB requests to mc1030?~~

EDIT: With gzip the key would have been 11577 bytes, too small for the chunk size of 83KB.

akosiaris subscribed.Mar 17 2020, 8:46 AM

If we want to verify which key causes those errors, we need to coordinate between SRE and release engineering and do some partial deployment. Please reach out to serviceops
so that we can schedule a test.

Any thoughts on this causing yesterday's parsercache overload at T247788: Parsercache sudden increase of connections? Even if the timestamps don't match?

thiemowmde unsubscribed.Mar 17 2020, 1:24 PM

I'm going to move this to our watching state, please push it back to our inbox if we can do more to help, or just tell me.

The key in question is enwiki:messages:en which upon release started being requested up to 10k times per second. Given the size of the key - 82788 bytes - this of course caused congestion on the poor server who had to serve it, specifically mc1030.

I suppose we have to look at whatever patch introduces this caching and revert it for now, and think of alternative approaches.

That key seems to include UI messages, which I thought were originally kept in php arrays?

In T247562#5976212, @Joe wrote:

That key seems to include UI messages, which I thought were originally kept in php arrays?

You are thinking of l10n cache which is a cache over the json files. Message cache ( *:messages:*) is a cache of message overrides in MediaWiki namespace per wiki and per language. Both cache similar things, but from a different source.

Would that mean we have a huge message somewhere under https://en.wikipedia.org/wiki/MediaWiki:XXX ?

In T247562#5976188, @Joe wrote:

The key in question is enwiki:messages:en which upon release started being requested up to 10k times per second. Given the size of the key - 82788 bytes - this of course caused congestion on the poor server who had to serve it, specifically mc1030.

I suppose we have to look at whatever patch introduces this caching and revert it for now, and think of alternative approaches.

*Thinking out loud*
83k is not too big but when lots of nodes start to request it at the same time, it causes network congestion. I think the easiest way to fix it is to have a copy of this in APCu. What do you think?

In T247562#5976371, @hashar wrote:

Would that mean we have a huge message somewhere under https://en.wikipedia.org/wiki/MediaWiki:XXX ?

The largest messages on en.wikipedia.org (just scrolling through Special:AllMessages) are:

https://en.wikipedia.org/wiki/MediaWiki:Bad_image_list (last changed 08:00, February 25, 2020‎ )
https://en.wikipedia.org/wiki/MediaWiki:Common.css (last changed 09:01, January 26, 2020‎ )
https://en.wikipedia.org/wiki/MediaWiki:Spam-blacklist (last changed 12:33, March 17, 2020‎ )
https://en.wikipedia.org/wiki/MediaWiki:Spam-whitelist (last changed 11:17, March 16, 2020‎ )
https://en.wikipedia.org/wiki/MediaWiki:Titleblacklist (last changed 14:23, February 25, 2020‎ )

Of these, the Spam-blacklist has been changed actively in the past few days and is presently 93,993 bytes.

So, these keys are stored/read from MessageCache. It has multi-level cache with localServer cache, cluster cache and WAN cache.

I think the easiest way to fix it is to have a copy of this in APCu. What do you think?

From reading the code, this should be done already.

Comparing MessageCache itself with wmf.22 - wmf.23, there doesn't seem to be any significant changes. Poking other changes, it seems like https://gerrit.wikimedia.org/r/c/mediawiki/core/+/576858 has landed in .23 what is changing how the local server cache, injected into the MessageCache gets initialized. Looking at that change I can't see anything that might suggest a problem, so perhaps this is a wrong path, but still adding it here as a data point.

FYI: Message cache caches the content of all messages together under one key, except those which are larger than wgMaxMsgCacheEntrySize, which is by default 10000 bytes, but 1024 bytes in WMF production. So this seems to be just a lot of small and medium sized messages, not super large ones. Though, I am not sure if change in the large ones cause cache invalidation for the main key.

Apparently with wmf.23 the key enwiki:messages:en kept being fetched from Memcached. If some messages invalidate it, I would expect the key to be fetched a few times to repopulate the local caches, but there should not be much more activity beside that.

I went on the Beta-Cluster-Infrastructure which has the whole debug logs. Logstash is broken there but we still have the raw udp2log messages on deployment-fluorine02.deployment-prep.eqiad.wmnet

Looking at the MessageCache log bucket:

/srv/mw-log/MessageCache.log

2020-03-17 21:10:09 [XnE8sawQBHcAAB@hI4QAAAAW] deployment-mediawiki-07 enwiki 1.35.0-alpha MessageCache DEBUG: MessageCache using store APCUBagOStuff {"class":"APCUBagOStuff"} 
2020-03-17 21:10:09 [XnE8sawQBHcAAB@hI4QAAAAW] deployment-mediawiki-07 enwiki 1.35.0-alpha MessageCache DEBUG: MessageCache::load: Loading en... local cache has the wrong hash, got from global cache  
2020-03-17 21:10:09 [XnE8sawQBHcAAB@hI4UAAAAJ] deployment-mediawiki-07 enwiki 1.35.0-alpha MessageCache DEBUG: MessageCache using store APCUBagOStuff {"class":"APCUBagOStuff"} 
2020-03-17 21:10:09 [XnE8sawQBHcAAB@hI4UAAAAJ] deployment-mediawiki-07 enwiki 1.35.0-alpha MessageCache DEBUG: MessageCache::load: Loading en... local cache has the wrong hash, got from global cache  
...

The debug info is forged in MessageCache with the language code and an indication of the source of the data. The breakdown for en:

$ grep 'Loading en\.\.\.' MessageCache.log |cut -d: -f4-|sort|uniq -c |sort -nr
  49911  MessageCache::load: Loading en... local cache has the wrong hash, got from global cache  
   4603  MessageCache::load: Loading en... got from local cache  
    913  MessageCache::load: Loading en... local cache is empty, global cache is empty, loading from database  
     13  MessageCache::load: Loading en... local cache is empty, got from global cache  
      1  MessageCache::load: Loading en... local cache validation key is expired/volatile, global cache is presumed expired, loading from database  
      1  MessageCache::load: Loading en... local cache is expired, global cache is expired, loading from database  
      1  MessageCache::load: Loading en... local cache is empty, global cache is expired/volatile, loading from database

We have logs up to 2020-03-02. I would expect got from local cache to always be the most frequent entry. That is the case by a huge margin until the logfile MessageCache.log-20200306.gz.

The first batches of entries for local cache has the wrong hash, got from global cache are around 2020-03-05 9:00 UTC. Which might match with a commit that got deployed on beta at that point and would have ended in wmf.23.

I am not sure whether that can help :/

The MessageCache::load: Loading en... local cache has the wrong hash, got from global cache does show up on mwdebug1001 with verbose log (they can then be retrieved in logstash). So I get either messages from the local cache or having a wrong hash and being fetched from global cache..

I have no idea why the cache varies. So that is all I have for today.

Mentioned in SAL (#wikimedia-operations) [2020-03-18T00:31:40Z] <Amir1> foreachwikiindblist medium deleteEqualMessages.php --delete (T247562)

In T247562#5977962, @Stashbot wrote:

Mentioned in SAL (#wikimedia-operations) [2020-03-18T00:31:40Z] <Amir1> foreachwikiindblist medium deleteEqualMessages.php --delete (T247562)

I'm deleting the values that are not needed anymore from that cache which would help with size of that cache value. There's nothing for enwiki though :/

This should be put in puppet and run regularly

In T247562#5977675, @hashar wrote:

[On beta], the first batches of entries for local cache has the wrong hash, got from global cache are around 2020-03-05 9:00 UTC. […]

2020-03-05 07:00 UTC: https://gerrit.wikimedia.org/r/577051 / rMW7a4e28a526e0: Localisation updates from https://translatewiki.net.
2020-03-05 01:30 UTC: https://gerrit.wikimedia.org/r/561340 / rMW2a9f94203d0f: Chinese Conversion Table Update 2020-1
2020-03-05 01:00 UTC: https://gerrit.wikimedia.org/r/576860 / rMWa85c03346469: objectcache: Remove temporary HashConfig in newFromParams()
2020-03-05 00:30 UTC: https://gerrit.wikimedia.org/r/576867 / rMW8fa48fc9d49a: SiteConfiguration: Optimise arrayMerge()

I looked at the l10n-related commits, but nothing stands out there. Maybe some invisible threshold we hit, but I don't see it.

Our Jenkins logs for beta deployments have already expired from March 5th, so I couldn't figure out if Beta-deploys were broken (as they often are), in order to know what the last automatic deploys before 9AM were. I looked through the previous day of core commits and found this one from myself:

2020-03-04 23:00 UTC: https://gerrit.wikimedia.org/r/576858 / rMW746d67f5fcc5: objectcache: Add makeLocalServerCache() method

I looked through it many times but was unable to find any difference in behaviour.

Before

# DefaultSettings
$wgObjectCaches = [
'apcu' => [ 'class' => APCUBagOStuff::class, 'reportDupes' => false ],
…
];
# ObjectCache.php
public static function detectLocalServerCache() {
  /* … */ return 'apcu';
}

After

public static function makeLocalServerCache() {
  $params = [ 'reportDupes' => false ];
  return new APCUBagOStuff( $params );
}

Looks the same, right? I dumped both from a test script on deployment-mediawiki-07 in the Beta Cluster, and accessed it over HTTP:

w/krinkle.php

<?php
define( 'MW_NO_SESSION', 1 );
require_once __DIR__ . '/../multiversion/MWMultiVersion.php';
require MWMultiVersion::getMediaWiki( 'includes/WebStart.php' );
header( 'Content-Type: text/plain; charset=utf-8' );

var_dump(ObjectCache::detectLocalServerCache());
var_dump(ObjectCache::getInstance(ObjectCache::detectLocalServerCache()));
echo "\n\n";
var_dump(ObjectCache::makeLocalServerCache());

Output

string(4) "apcu"
object(APCUBagOStuff)#310 (17) {
  ["nativeSerialize":"APCUBagOStuff":private]=>
  bool(true)
  ["keyspace":protected]=>
  string(6) "enwiki"
  ["reportDupes":"MediumSpecificBagOStuff":private]=>
  bool(false)
  ["logger":protected]=>
  object(Monolog\Logger)#346 (5) {…}
  ["asyncHandler":protected]=>
  array(2) {
    [0]=>
    string(15) "DeferredUpdates"
    [1]=>
    string(17) "addCallableUpdate"
  }
}

object(APCUBagOStuff)#309 (17) {
  ["nativeSerialize":"APCUBagOStuff":private]=>
  bool(true)
  ["keyspace":protected]=>
  string(5) "local"               # <!-- BOOM
  ["reportDupes":"MediumSpecificBagOStuff":private]=>
  bool(false)
  ["logger":protected]=>
  object(Psr\Log\NullLogger)#308 (0) {
  }
  ["asyncHandler":protected]=>
  NULL
}

The main difference is that the keyspace attribute is no longer set. The keyspace is what differentiates wikis from each other when calling the non-shared makeKey() method for cache keys.

This array key isn't anywhere in the Before code above, because it was automagically injected by the ObjectCache::newFromParams factory.

Change 580577 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] objectcache: Restore 'keyspace' for LocalServerCache service

https://gerrit.wikimedia.org/r/580577

gerritbot added a project: Patch-For-Review.Mar 18 2020, 1:44 AM

This is also causing T247078

Change 580577 merged by jenkins-bot:
[mediawiki/core@master] objectcache: Restore 'keyspace' for LocalServerCache service

https://gerrit.wikimedia.org/r/580577

Change 580598 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.35.0-wmf.24] objectcache: Restore 'keyspace' for LocalServerCache service

https://gerrit.wikimedia.org/r/580598

Change 580599 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.35.0-wmf.23] objectcache: Restore 'keyspace' for LocalServerCache service

https://gerrit.wikimedia.org/r/580599

In T247562#5978057, @Bawolff wrote:

This is also causing T247078

Perhaps also T247908 and T247477?

Those patches would do it though I am unable to deploy them this morning (kids are homeschooling). At best I am there at ~ 1pm UTC (4 hours from now).

If one has the bandwidth to deploy them and verify, please be bold.

Lydia_Pintscher mentioned this in T247712: Interwiki link problem on various wikis.Mar 18 2020, 10:02 AM

Lydia_Pintscher subscribed.

Change 580599 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.23] objectcache: Restore 'keyspace' for LocalServerCache service

https://gerrit.wikimedia.org/r/580599

Change 580598 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.24] objectcache: Restore 'keyspace' for LocalServerCache service

https://gerrit.wikimedia.org/r/580598

I am going to get them deployed.

Mentioned in SAL (#wikimedia-operations) [2020-03-18T11:57:54Z] <hashar@deploy1001> Synchronized php-1.35.0-wmf.23/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - T247562 (duration: 01m 10s)

Mentioned in SAL (#wikimedia-operations) [2020-03-18T11:59:09Z] <hashar@deploy1001> Synchronized php-1.35.0-wmf.24/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - T247562 (duration: 01m 07s)

I have deployed the hotfixes. The reproducible case I had is fixed (mwdebug1001, browse www.mediawiki.org then meta.wikimedia.org).

Thanks @tstarling for the memcached spikes and the size/slab investigation (I have learned about slabs), @Krinkle to have figured out the source of the cache variance. And in general thank you everyone that commented on this task and helped on the investigation one way or an other \o/

Bugreporter merged a task: T247493: Incorrect namespace displayed in ns4 page title on Chinese Wikiquote and Wikivoyage.Mar 18 2020, 4:04 PM

Bugreporter added subscribers: WhitePhosphorus, Xiplus.

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.25; 2020-03-24).Mar 19 2020, 8:45 PM

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.24; 2020-03-17); removed MW-1.35-notes (1.35.0-wmf.25; 2020-03-24).

matej_suchanek edited projects, added User-notice; removed Patch-For-Review.Mar 21 2020, 8:59 AM

matej_suchanek moved this task from To Triage to In current Tech/News draft on the User-notice board.

DannyS712 moved this task from In current Tech/News draft to Already announced/Archive on the User-notice board.Mar 23 2020, 5:10 PM

brennen mentioned this in T251450: Scattering of PHP Warning: Memcached::setMulti(): failed to set key global:segment:....Apr 29 2020, 6:39 PM

Esanders mentioned this in T286409: DiscussionTools reply tool needs double-posting protection.Oct 18 2021, 4:12 PM

Maintenance_bot edited projects, added User-notice-archive; removed User-notice.Aug 13 2022, 12:54 PM

Restricted Application added a project: Performance-Team. · View Herald TranscriptAug 13 2022, 12:54 PM

Func mentioned this in T361177: APCU cache mixup across wikis (Incorrect namespace displayed as title on a zh.wikivoyage page).Thu, Mar 28, 8:57 AM

	F31685994: mc1030 slabs.png
	Mar 17 2020, 3:37 AM

	F31685600: mc net tx spike.png
	Mar 16 2020, 10:43 PM

Warning: Memcached::setMulti(): failed to set key global:segment:...Closed, ResolvedPublicPRODUCTION ERRORActions