Page MenuHomePhabricator

Upgrade memcached for Debian Stretch/Buster
Open, NormalPublic

Description

In T129963 we explored some newer versions of memcache to deploy for the MW object cache, but we have never decided to upgrade all the shards. On all the mcXXXX hosts we are running Jessie, and sooner or later we'll have to think about either Stretch or Buster :)

The major complication is the fact that Redis and the MW Session Storage is co-located on the same nodes, so upgrading the OS means upgrading both Memcached and Redis at the same time.

While we could wait for the new Session Storage Service to be alive (that should in theory get rid of Redis in favor of something else), I would like to choose a new version of memcached and try it on a couple of Production shards for a couple of months to study and tune settings, since from T129963 we know that a lot has changed. Some highlights:

  • the maximum number of slab classes for a "recent" 1.4 or 1.5 version of memcached is 64, meanwhile we are currently using a lot more (160+) on each shard due to the growth factor that we use. In T129963 we tested the increase of te growth factor to 1.15, it seemed working nicely.
  • the LRU logic has been completely changed, more info in https://github.com/memcached/memcached/blob/master/doc/new_lru.txt and https://memcached.org/blog/modern-lru
  • SLAB automover - freed memory can be reclaimed back into a global pool and reassigned to new slab classes (currently memory assigned to a slab class cannot be reclaimed, even if free, for another use).
  • new features are now ready to use and tested by a lot of people already.

One solution could be to decide a version to test/use (either Stretch's or Buster's), backport it to Jessie (that shouldn't be too difficult in theory) and start the testing in deployment-prep/prod as soon as possible, to be ready for a reimage to Stretch/Buster when the time comes.

Event Timeline

elukey triaged this task as Normal priority.Jan 7 2019, 3:50 PM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 7 2019, 3:50 PM
Paladox added a subscriber: Paladox.Jan 7 2019, 8:56 PM

When backporting this to jessie, we'll need to carefully review the systemd hardening options used in the memcached systemd unit, some of the feature are likely not yet supported in the systemd version in jessie.

Given that the Debian buster release isn't too far away and has 1.5.6 (and we could ask the maintainer if an update to 1.5.12 is in the works), I'd prefer to test with a backport of 1.5.6, by the time an update in production is ready, buster will be released and it feels like a bit of a waste to perform a massive migration effort to stretch for something we'd soon need to upgrade again.

elukey added a comment.Jan 9 2019, 3:55 PM

I'd prefer 1.5.6 too, we'd be really close to upstream (atm 1.5.12) and getting help from them would surely be easier if needed. I'd also love to be able to provide feedback to the memcached project about scalability and/or bottlenecks of running a recent version at scale (for example, LRU special use case, etc..).

The downside is of course that 1.5.6 is a lot different than our current version, so extensive testing will be needed :)

elukey added a subscriber: faidon.Jan 11 2019, 9:50 AM
jijiki added a subscriber: jijiki.Jan 18 2019, 11:40 AM
jijiki removed a subscriber: elukey.
jijiki moved this task from Backlog to Incoming on the serviceops board.Jan 18 2019, 12:03 PM
jijiki added a subscriber: elukey.Jan 18 2019, 1:55 PM
elukey updated the task description. (Show Details)Feb 9 2019, 2:37 PM
elukey added a comment.EditedFeb 9 2019, 2:39 PM

Found https://github.com/memcached/memcached/issues/446 today, in which upstream warns about stability issues with 1.5.6 that should have been resolved with 1.5.12. Let's keep it in mind when testing new versions :)

EDIT: after a chat with upstream it was suggested to me to follow up with Debian to avoid shipping 1.5.6 since it contains some bugs resolved in later versions. I'll try to follow up with Debian upstream asap!

EDIT: after a chat with upstream it was suggested to me to follow up with Debian to avoid shipping 1.5.6 since it contains some bugs resolved in later versions. I'll try to follow up with Debian upstream asap!

I got an answer from the Debian memcached maintainer and he was not able to upload 1.5.12 before the Buster freeze, so that version will be probably offered as Buster backport.

Leaving here also a reference of https://github.com/memcached/memcached/issues/359:

Regression in systemd-based sandboxing in 1.5.6

Leaving here also a reference of https://github.com/memcached/memcached/issues/359:

Regression in systemd-based sandboxing in 1.5.6

Can you file a bug in Debian, please? Given that testing has 1.5.6 this can still be cherrypicked despite buster being frozen for new upstream releases.

elukey moved this task from Backlog to Stalled on the User-Elukey board.Apr 15 2019, 12:55 PM
elukey moved this task from Stalled to Backlog on the User-Elukey board.Apr 16 2019, 11:02 AM
Joe moved this task from Incoming to Backlog on the serviceops board.Jun 21 2019, 8:45 AM
elukey moved this task from Backlog to Mcrouter/Memcached on the User-Elukey board.Jul 5 2019, 6:53 AM