Page MenuHomePhabricator

memcached error: A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
Closed, ResolvedPublicPRODUCTION ERROR

Description

Spotted some errors in logstash, easily found via https://logstash.wikimedia.org/app/kibana#/dashboard/memcached

Memcached error for key "{memcached-key}" on server "{memcached-server}":
A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE

That seems related to 1.33.0-wmf.4 (T206658). Some related to jobrunner.discovery.wmnet / /rpc/RunSingleJob.php , but I had at least one from an api call.

The memcached key seems to be an empty string which would explain the error. I could not find stacktraces in logstash :/

Event Timeline

Does not seem to harmful, I will attempt to find the root cause tomorrow.

zeljkofilipin triaged this task as Unbreak Now! priority.Nov 14 2018, 3:36 PM
zeljkofilipin subscribed.

Train blockers are UBN.

This comment was removed by Anomie.

Change 473551 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@wmf/1.33.0-wmf.4] Add trace to debug memcached bad key error

https://gerrit.wikimedia.org/r/473551

Change 473551 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.4] Add trace to debug memcached bad key error

https://gerrit.wikimedia.org/r/473551

Mentioned in SAL (#wikimedia-operations) [2018-11-14T17:58:57Z] <hashar@deploy1001> Started scap: php-1.33.0-wmf.4/includes/libs/objectcache/MemcachedPeclBagOStuff.php Add trace to debug memcached bad key error - T209429

Mentioned in SAL (#wikimedia-operations) [2018-11-14T18:33:04Z] <hashar@deploy1001> Finished scap: php-1.33.0-wmf.4/includes/libs/objectcache/MemcachedPeclBagOStuff.php Add trace to debug memcached bad key error - T209429 (duration: 34m 07s)

Looks like it's probably something to do with rMW51945dbca359: Use DB domain in JobQueueGroup and make WikiMap domain ID methods stricter. I'm continuing to investigate.

A representative trace:

2018-11-14 18:31:06 [W@xp6QpAAEAAAIGhifMAAACC] mw1269 mediawikiwiki 1.33.0-wmf.4 memcached ERROR: Memcached error for key "" on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE {"memcached-server":":","memcached-key":null} 
[Exception Exception] (/srv/mediawiki/php-1.33.0-wmf.4/includes/libs/objectcache/MemcachedPeclBagOStuff.php:234) A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
  #0 /srv/mediawiki/php-1.33.0-wmf.4/includes/libs/objectcache/MemcachedPeclBagOStuff.php(167): MemcachedPeclBagOStuff->checkResult(NULL, boolean)
  #1 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(491): MemcachedPeclBagOStuff->set(NULL, string, integer)
  #2 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(464): JobQueue->doDeduplicateRootJob(HTMLCacheUpdateJob)
  #3 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(345): JobQueue->deduplicateRootJob(HTMLCacheUpdateJob)
  #4 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(310): JobQueue->batchPush(array, integer)
  #5 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueueGroup.php(158): JobQueue->push(array)
  #6 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/JobQueueEnqueueUpdate.php(57): JobQueueGroup->push(array)
  #7 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/DeferredUpdates.php(270): JobQueueEnqueueUpdate->doUpdate()
  #8 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/DeferredUpdates.php(216): DeferredUpdates::runUpdate(JobQueueEnqueueUpdate, Wikimedia\Rdbms\LBFactoryMulti, string, integer)
  #9 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/DeferredUpdates.php(140): DeferredUpdates::execute(array, string, integer)
  #10 /srv/mediawiki/php-1.33.0-wmf.4/includes/MediaWiki.php(905): DeferredUpdates::doUpdates(string)
  #11 /srv/mediawiki/php-1.33.0-wmf.4/includes/MediaWiki.php(728): MediaWiki->restInPeace(string, boolean)
  #12 [internal function]: Closure$MediaWiki::doPostOutputShutdown()
  #13 {main}

All the traces so far seem to go through JobQueue::deduplicateRootJob().

I ran a full sync for deployment, hence the delay :(

We have traces now, an example:

#0 /srv/mediawiki/php-1.33.0-wmf.4/includes/libs/objectcache/MemcachedPeclBagOStuff.php(156): MemcachedPeclBagOStuff->checkResult(NULL, boolean)
#1 /srv/mediawiki/php-1.33.0-wmf.4/includes/libs/objectcache/MemcachedBagOStuff.php(56): MemcachedPeclBagOStuff->getWithToken(NULL, NULL, integer)
#2 /srv/mediawiki/php-1.33.0-wmf.4/includes/libs/objectcache/BagOStuff.php(193): MemcachedBagOStuff->doGet(NULL, integer)
#3 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(485): BagOStuff->get(NULL)
#4 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(464): JobQueue->doDeduplicateRootJob(RefreshLinksJob)
#5 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(345): JobQueue->deduplicateRootJob(RefreshLinksJob)
#6 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php(310): JobQueue->batchPush(array, integer)
#7 /srv/mediawiki/php-1.33.0-wmf.4/includes/jobqueue/JobQueueGroup.php(158): JobQueue->push(array)
#8 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/LinksUpdate.php(363): JobQueueGroup->push(array)
#9 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/LinksUpdate.php(316): LinksUpdate::queueRecursiveJobsForTable(Title, string, string, string)
#10 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/LinksUpdate.php(299): LinksUpdate->queueRecursiveJobs()
#11 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/LinksUpdate.php(175): LinksUpdate->doIncrementalUpdate()
#12 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/DeferredUpdates.php(270): LinksUpdate->doUpdate()
#13 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/DeferredUpdates.php(216): DeferredUpdates::runUpdate(LinksUpdate, Wikimedia\Rdbms\LBFactoryMulti, string, integer)
#14 /srv/mediawiki/php-1.33.0-wmf.4/includes/deferred/DeferredUpdates.php(140): DeferredUpdates::execute(array, string, integer)
#15 /srv/mediawiki/php-1.33.0-wmf.4/includes/MediaWiki.php(905): DeferredUpdates::doUpdates(string)
#16 /srv/mediawiki/php-1.33.0-wmf.4/includes/MediaWiki.php(728): MediaWiki->restInPeace(string, boolean)
#17 [internal function]: Closure$MediaWiki::doPostOutputShutdown()
#18 {main}

Change 473578 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] JobQueue: Actually return the value from getRootJobCacheKey()

https://gerrit.wikimedia.org/r/473578

Change 473579 had a related patch set uploaded (by Hashar; owner: Anomie):
[mediawiki/core@wmf/1.33.0-wmf.4] JobQueue: Actually return the value from getRootJobCacheKey()

https://gerrit.wikimedia.org/r/473579

Change 473578 merged by jenkins-bot:
[mediawiki/core@master] JobQueue: Actually return the value from getRootJobCacheKey()

https://gerrit.wikimedia.org/r/473578

Change 473579 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.4] JobQueue: Actually return the value from getRootJobCacheKey()

https://gerrit.wikimedia.org/r/473579

Mentioned in SAL (#wikimedia-operations) [2018-11-14T19:14:23Z] <hashar@deploy1001> Synchronized php-1.33.0-wmf.4/includes/jobqueue/JobQueue.php: Actually return the value from getRootJobCacheKey() - T209429 (duration: 00m 53s)

hashar assigned this task to Anomie.

Thank you Brad. There is no more errors in logstash.

Sorry for the long time it took to add the stack trace to the log context, I ran a full scap instead of syncing just the file.

Sorry for the long time it took to add the stack trace to the log context, I ran a full scap instead of syncing just the file.

No problem, things happen sometimes.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM