Page MenuHomePhabricator

Reimage one memcached shard to Buster
Open, Stalled, MediumPublic

Description

Given T251378, I'd propose to move one memcached shard to Buster. The goal is to verify and test with production traffic that the configuration for Buster works as expected. There has been a lot of testing in the past, but I am pretty sure that some tuning will be needed for mc10xx. The idea is to work on one shard for the moment, and then upgrade all the others when we'll be in a good state.

Note the chosen mc10xx host should be removed from the Redis MediaWiki pool. It should be sufficient to check the MediaWiki config and the nutracker's one. Since a nutcracker restart is required, this might cause some user impact.

Event Timeline

elukey created this task.May 11 2020, 10:23 AM
Marostegui triaged this task as Medium priority.May 12 2020, 5:21 AM
Marostegui moved this task from Backlog to Radar on the Operations board.

Change 595810 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove mc1036/mc2036 from the Redis Nutcracker config

https://gerrit.wikimedia.org/r/595810

Some notes:

elukey changed the task status from Open to Stalled.May 13 2020, 9:56 AM

Precisely, let's hold this task until T243106 is completed.

In a separate task, I mentioned the following:

on every mcXXXX we have ~25GB of free RAM (not even used by page cache) that we currently don't use. Even if we'd allocate 10G on every host to be conservative, we'd end up adding 180G in total (we have ~1600G allocated in eqiad, 89G for each shard). That would be ~11% more capacity only using what we currently have. I'd suggest to start with this and see if the cluster overall performance improves (less evictions, higher get hit rate, etc..).

When we'll move to Buster I'd also take the opportunity to use more RAM, it is there and it should be used in my opinion :)

Change 603942 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] [WIP] memcached: allow more tunables to avoid implicit settings

https://gerrit.wikimedia.org/r/603942

Change 603942 merged by Elukey:
[operations/puppet@production] memcached: allow more tunables to avoid implicit settings

https://gerrit.wikimedia.org/r/603942

A little note about the last patch merged. There are two main memcached parameters that can influence the distribution of the slab classes' chunk size: growth factor and smallest chunk size. The algorithm used by memcached 1.5.x is something like the following:

import argparse
import math

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Generate slab list for memcached.')
    parser.add_argument('f', type=float, help='Chunk size growth factor / -f parameter')
    parser.add_argument('n', type=int, help='Minimum space allocated for key+value+flags')

    args = parser.parse_args()
    growth_factor = args.f
    # sizeof(item) + chunk_size
    chunk_size = 48 + args.n
    slab = 1
    chunk_align = 8
    max_slab_reached = False

    while True:
        if chunk_size % chunk_align:
            chunk_size += chunk_align - (chunk_size % chunk_align)
        if chunk_size >= 512000:
            chunk_size = 512000
            max_slab_reached = True
        print("Slab: {} Chunk: {}".format(str(slab), str(math.floor(chunk_size))))
        chunk_size = math.floor(chunk_size * growth_factor)
        slab += 1
        if slab >= 64 or max_slab_reached:
            break
        if slab == 63:
            print("Slab: 63 Chunk: 512000")
            break

Since years ago, we use growth factor 1.05 and smallest chunk size 5 bytes. This is the distribution of chunk sizes for the Gutter Pool for example:

~ python3 memc_growth_distrib.py 1.15 5
Slab: 1 Chunk: 56
Slab: 2 Chunk: 64
Slab: 3 Chunk: 80
Slab: 4 Chunk: 96
Slab: 5 Chunk: 112
Slab: 6 Chunk: 128
Slab: 7 Chunk: 152
Slab: 8 Chunk: 176
Slab: 9 Chunk: 208
[..]

The distribution can easily checked in grafana. There is one little gotcha though, namely that the biggest slab is capped, by default, to the max chunk size that is 512K (at least this is my understanding from reading docs + code). So running our script above we get:

[..]
Slab: 52 Chunk: 94592
Slab: 53 Chunk: 108784
Slab: 54 Chunk: 125104
Slab: 55 Chunk: 143872
Slab: 56 Chunk: 165456
Slab: 57 Chunk: 190280
Slab: 58 Chunk: 218824
Slab: 59 Chunk: 251648
Slab: 60 Chunk: 289400
Slab: 61 Chunk: 332816
Slab: 62 Chunk: 382744
Slab: 63 Chunk: 512000

But in reality there is a big jump between 62 and 63, namely 383k -> 512K. Items bigger than 512K and up to 1MB (max size limit for a key, now configurable) will all be stored in the last slab, using one or more chunks "glued" together.

There is also another relevant thing - how come that the smallest chunk size that we set is 5B, but the first slab class is 56B? The reason is that memcached will add additional 48B for the key's metadata+book-keeping, and it will round the size to multiple of 8B for alignment purposes. If we check grafana for mc1027 (or any other shard) we notice that the first slabs are around 80/90B, so we should think about using the default starting slab size at 48B.

The distribution will become:

~ python3 memc_growth_distrib.py 1.15 48
Slab: 1 Chunk: 96
Slab: 2 Chunk: 112
Slab: 3 Chunk: 128
Slab: 4 Chunk: 152
Slab: 5 Chunk: 176
Slab: 6 Chunk: 208
Slab: 7 Chunk: 240
Slab: 8 Chunk: 280
Slab: 9 Chunk: 328
Slab: 10 Chunk: 384
Slab: 11 Chunk: 448
Slab: 12 Chunk: 520
Slab: 13 Chunk: 600
Slab: 14 Chunk: 696
Slab: 15 Chunk: 800
Slab: 16 Chunk: 920
Slab: 17 Chunk: 1064
Slab: 18 Chunk: 1224
Slab: 19 Chunk: 1408
Slab: 20 Chunk: 1624
Slab: 21 Chunk: 1872
Slab: 22 Chunk: 2152
Slab: 23 Chunk: 2480
Slab: 24 Chunk: 2856
Slab: 25 Chunk: 3288
Slab: 26 Chunk: 3784
Slab: 27 Chunk: 4352
Slab: 28 Chunk: 5008
Slab: 29 Chunk: 5760
Slab: 30 Chunk: 6624
Slab: 31 Chunk: 7624
Slab: 32 Chunk: 8768
Slab: 33 Chunk: 10088
Slab: 34 Chunk: 11608
Slab: 35 Chunk: 13352
Slab: 36 Chunk: 15360
Slab: 37 Chunk: 17664
Slab: 38 Chunk: 20320
Slab: 39 Chunk: 23368
Slab: 40 Chunk: 26880
Slab: 41 Chunk: 30912
Slab: 42 Chunk: 35552
Slab: 43 Chunk: 40888
Slab: 44 Chunk: 47024
Slab: 45 Chunk: 54080
Slab: 46 Chunk: 62192
Slab: 47 Chunk: 71520
Slab: 48 Chunk: 82248
Slab: 49 Chunk: 94592
Slab: 50 Chunk: 108784
Slab: 51 Chunk: 125104
Slab: 52 Chunk: 143872
Slab: 53 Chunk: 165456
Slab: 54 Chunk: 190280
Slab: 55 Chunk: 218824
Slab: 56 Chunk: 251648
Slab: 57 Chunk: 289400
Slab: 58 Chunk: 332816
Slab: 59 Chunk: 382744
Slab: 60 Chunk: 440160
Slab: 61 Chunk: 506184
Slab: 62 Chunk: 512000

That seems to be good to test for me. Thoughts? If nobody opposes I'd try to roll it out to the gutter pool first :)

Change 605617 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::mediawiki::memcached::gutter: change slab distribution

https://gerrit.wikimedia.org/r/605617

Change 605617 merged by Elukey:
[operations/puppet@production] role::mediawiki::memcached::gutter: change slab distribution

https://gerrit.wikimedia.org/r/605617

Mentioned in SAL (#wikimedia-operations) [2020-06-16T06:25:10Z] <elukey> roll restart memcached on mc-gp* (gutter pools) to pick up new slab size distribution setting - T252391

The next steps for this task should be:

  1. Remove the nutcracker shards in https://gerrit.wikimedia.org/r/595810 (the change should re-hash their (mc1036/2036) keys to the rest of the shards, so not all the keys will be moved/shifted around). Sessions are not in Redis anymore, so it is less risky but nonetheless we may want to wait until Redis is less used. The downside is that this may not happen in months, so we'll not be able to test memcached on Buster in the meantime (with production traffic I mean, tuning slabs etc..).
  1. Reimage mc1036 to Buster, using the gutter pool's settings (possibly tuning the number of threads).

Note after checking slab distribution on the gutter pool. The last slab sizes seem to not follow the prediction made by the script:

STAT 49:chunk_size 94592
STAT 50:chunk_size 108784
STAT 51:chunk_size 125104
STAT 52:chunk_size 143872
STAT 53:chunk_size 165456
STAT 54:chunk_size 190280
STAT 55:chunk_size 218824
STAT 56:chunk_size 251648
STAT 57:chunk_size 289400
STAT 58:chunk_size 332816
STAT 59:chunk_size 382744
STAT 61:chunk_size 524288

It is all fine up to slab 59, then 60 is missing (probably due to the absence of anything stored, but should be there at around ~440k) and 61 is at 524k, so there is something that I didn't take into account in the script. Will check, but it should look good anyway, let's see how it behaves.

elukey added a subscriber: Krinkle.Tue, Jun 23, 8:34 AM

Getting back to https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/595810/ - one thing that it would be useful before merging is to dump all the keys on mc1036 and get a breakdown of the content.

Started with redis-cli -a "$(sudo grep -Po '(?<=masterauth ).*' /etc/redis/tcp_6379.conf)" -p 6379 KEYS \* > keys.txt

elukey@mc1036:~$ wc -l keys.txt
379432 keys.txt

The majority of the keys (~81%) are related to centralauth::session, that in theory should have been migrated to cask?

elukey@mc1036:~$ grep 'centralauth:session' keys.txt | wc -l
308898

Then we have global loginnotify prevSubnet:

elukey@mc1036:~$ grep global:loginnotify:prevSubnet keys.txt | wc -l
42740

Then we have chronology protector:

elukey@mc1036:~$ grep 'Wikimedia\\Rdbms\\ChronologyProtector' keys.txt | wc -l
17593

And OAUTH tokens:

elukey@mc1036:~$ grep OAUTH keys.txt | wc -l
5337

The remaining keys are:

elukey@mc1036:~$ cat keys.txt| grep -v 'Wikimedia\\Rdbms\\ChronologyProtector' | grep -v 'centralauth:session' | grep -v OAUTH | grep -v global:loginnotify:prevSubnet| awk -F ":" '{$(NF--)=""; print}' | sort | uniq -c | sort -n -k 1
      1 aawiktionary abusefilter-profile group
      1 angwiki abusefilter-profile group
      1 arbcom_cswiki abusefilter-profile group
      1 arwiki abusefilter profile 22
      1 arwiki abusefilter profile 61
      1 arwiki abusefilter-profile group
      1 arwiki abusefilter throttle 106 user
      1 arwikisource abusefilter-profile v3
      1 arzwiki captcha
      1 aywiki abusefilter-profile group
      1 azwiki captcha
      1 bdwikimedia abusefilter-profile group
      1 betawikiversity captcha
      1 bewiki abusefilter-profile v3
      1 bswikiquote captcha
      1 bswikisource captcha
      1 cawikiquote abusefilter-profile group
      1 centralauth centralautologin-token 
      1 centralauth centralautologin-token 
      1 chrwiktionary captcha
      1 commonswiki abusefilter profile 117
      1 commonswiki abusefilter profile 141
      1 commonswiki abusefilter profile 154
      1 commonswiki abusefilter profile 16
      1 commonswiki abusefilter profile 166
      1 commonswiki abusefilter profile 168
      1 commonswiki abusefilter profile 170
      1 commonswiki abusefilter profile 57
      1 commonswiki abusefilter profile 89
      1 commonswiki sitestatsupdate pendingdelta ss_images
      1 crhwiki captcha
      1 csbwiki captcha
      1 cswiki captcha
      1 cswiki editor-journey
      1 dewikibooks captcha
      1 dewikinews captcha
      1 dewikiquote flaggedrevs
      1 dewikisource captcha
      1 dewiktionary flaggedrevs
      1 dsbwiki captcha
      1 dvwiki captcha
      1 eewiki abusefilter-profile group
      1 elwiki abusefilter-profile group
      1 elwiki abusefilter-profile v3
      1 elwikivoyage abusefilter-profile group
      1 enwiki abusefilter profile 172
      1 enwiki abusefilter profile 260
      1 enwiki abusefilter profile 425
      1 enwiki abusefilter profile 61
      1 enwiki abusefilter profile 932
      1 enwiki abusefilter profile 989
      1 enwikibooks abusefilter-profile v3
      1 enwikinews abusefilter-profile v3
      1 enwikinews flaggedrevs
      1 enwikivoyage abusefilter-profile group
      1 eswiki abusefilter profile 42
      1 eswiki abusefilter profile 43
      1 eswiki abusefilter profile 56
      1 eswiki abusefilter profile 89
      1 eswiki abusefilter profile 9
      1 eswikibooks abusefilter-profile v3
      1 eswikinews abusefilter profile 18
      1 eswikinews captcha
      1 eswikinews flaggedrevs
      1 eswikiquote abusefilter-profile group
      1 eswikiquote captcha
      1 eswikivoyage abusefilter-profile v3
      1 euwiki abusefilter profile 169
      1 euwikibooks abusefilter-profile group
      1 euwiki editor-journey
      1 fawiki abusefilter profile 125
      1 fawiki abusefilter-profile group
      1 fawikiquote abusefilter-profile group
      1 fawikiquote abusefilter-profile v3
      1 fawikivoyage captcha
      1 fiwiki abusefilter profile 6
      1 frwiki abusefilter profile 19
      1 frwiki abusefilter profile 214
      1 frwiki abusefilter profile 242
      1 frwiki abusefilter profile 278
      1 frwikinews flaggedrevs
      1 frwikiquote abusefilter-profile v3
      1 frwikisource abusefilter-profile group
      1 fywikibooks abusefilter-profile group
      1 ganwiki abusefilter-profile group
      1 gdwiki captcha
      1 global abusefilter throttle metawiki 206 user
      1 global watchlist-recent-updates arwiki
      1 global watchlist-recent-updates arzwiki
      1 global watchlist-recent-updates azwiki
      1 global watchlist-recent-updates be_x_oldwiki
      1 global watchlist-recent-updates bgwiki
      1 global watchlist-recent-updates cawiki
      1 global watchlist-recent-updates cswiki
      1 global watchlist-recent-updates dewikivoyage
      1 global watchlist-recent-updates enwiktionary
      1 global watchlist-recent-updates fawiki
      1 global watchlist-recent-updates fawikivoyage
      1 global watchlist-recent-updates frwiktionary
      1 global watchlist-recent-updates hiwiki
      1 global watchlist-recent-updates itwikiquote
      1 global watchlist-recent-updates rmwiki
      1 global watchlist-recent-updates tewiki
      1 global watchlist-recent-updates thwiki
      1 global watchlist-recent-updates trwiki
      1 glwiki abusefilter profile 15
      1 glwiki abusefilter-profile v3
      1 glwikiquote abusefilter-profile group
      1 gnwiki captcha
      1 gomwiki abusefilter-profile group
      1 guwikisource abusefilter-profile group
      1 gvwiktionary captcha
      1 hawiki captcha
      1 hewiki abusefilter profile 47
      1 hewiki abusefilter profile 53
      1 hewikisource abusefilter profile 132
      1 hiwiki abusefilter-profile v3
      1 hrwikibooks abusefilter-profile group
      1 hrwiki captcha
      1 huwikibooks captcha
      1 huwiki captcha
      1 hywiki abusefilter profile 196
      1 hywiki abusefilter-profile v3
      1 hywiki captcha
      1 hywiki newcomer-tasks
      1 idwiki abusefilter profile 8
      1 idwiki abusefilter-profile v3
      1 idwikibooks abusefilter-profile v3
      1 idwikiquote captcha
      1 incubatorwiki abusefilter profile 176
      1 incubatorwiki captcha
      1 itwiki abusefilter profile 267
      1 itwiki abusefilter profile 353
      1 itwiki abusefilter profile 394
      1 itwiki abusefilter profile 423
      1 itwiki abusefilter profile 472
      1 itwikinews abusefilter-profile v3
      1 itwikinews captcha
      1 itwikiversity abusefilter-profile v3
      1 itwikiversity captcha
      1 itwikivoyage abusefilter-profile v3
      1 itwiktionary abusefilter-profile v3
      1 iuwiktionary captcha
      1 jawikiquote abusefilter-profile v3
      1 jawikisource abusefilter-profile v3
      1 jvwiki captcha
      1 kabwiki abusefilter-profile group
      1 kbpwiki abusefilter-profile group
      1 kmwikibooks captcha
      1 knwiki abusefilter-profile v3
      1 kowiktionary abusefilter-profile v3
      1 kywiki abusefilter-profile group
      1 lfnwiki captcha
      1 lnwiktionary captcha
      1 loginwiki abusefilter profile 95
      1 ltgwiki captcha
      1 ltwiki captcha
      1 mediawikiwiki sitestatsupdate pendingdelta ss_total_pages
      1 metawiki abusefilter profile 101
      1 metawiki abusefilter profile 181
      1 metawiki abusefilter-profile group
      1 metawiki translate-translator-activity-v1
      1 miwiki captcha
      1 mnwwiki captcha
      1 mswiki abusefilter-profile v3
      1 mswikibooks abusefilter-profile group
      1 mswiki captcha
      1 mwlwiki abusefilter-profile group
      1 mywiki abusefilter-profile v3
      1 mznwiki abusefilter-profile group
      1 nawiki captcha
      1 nawiktionary captcha
      1 nlwiki abusefilter-profile v3
      1 nlwikibooks captcha
      1 nlwikimedia captcha
      1 nlwikinews captcha
      1 nlwiktionary captcha
      1 ocwiki abusefilter-profile group
      1 olowiki captcha
      1 orwiki captcha
      1 orwikisource captcha
      1 orwiktionary captcha
      1 pamwiki abusefilter-profile group
      1 papwiki captcha
      1 piwiki captcha
      1 plwiki abusefilter-profile group
      1 plwikisource abusefilter profile 2
      1 plwiktionary abusefilter-profile v3
      1 ptwiki abusefilter profile 114
      1 ptwiki abusefilter profile 120
      1 ptwiki abusefilter profile 139
      1 ptwiki abusefilter profile 172
      1 ptwiki abusefilter profile 31
      1 ptwiki abusefilter profile 94
      1 ptwikibooks captcha
      1 ptwikinews abusefilter profile 123
      1 ptwikinews abusefilter profile 72
      1 ptwikiquote captcha
      1 ptwikisource abusefilter-profile v3
      1 ptwikisource captcha
      1 ptwiktionary abusefilter-profile v3
      1 ptwiktionary captcha
      1 rnwiki abusefilter-profile group
      1 rowiki abusefilter profile 53
      1 rowiki captcha
      1 ruewiki captcha
      1 ruwiki abusefilter throttle 17 ip
      1 ruwikinews captcha
      1 ruwikiquote captcha
      1 rwwiki abusefilter-profile group
      1 sawiki abusefilter-profile group
      1 sawiki abusefilter-profile v3
      1 sawikisource captcha
      1 scnwiki captcha
      1 sdwiki abusefilter-profile v3
      1 siwiki captcha
      1 skwiki abusefilter throttle 33 user,page
      1 specieswiki abusefilter-profile v3
      1 suwiki captcha
      1 svwiki abusefilter profile 33
      1 svwiki abusefilter profile 53
      1 svwiki abusefilter profile 89
      1 svwiktionary abusefilter-profile v3
      1 szlwiki abusefilter-profile group
      1 tawiki abusefilter profile 181
      1 tawiki abusefilter-profile v3
      1 tawiki captcha
      1 tawikisource captcha
      1 tawiktionary captcha
      1 tcywiki captcha
      1 testwiki centralnotice bannerfields
      1 tewiki captcha
      1 tkwiki abusefilter-profile group
      1 tpiwiktionary abusefilter-profile group
      1 trwikisource abusefilter-profile group
      1 trwiktionary abusefilter-profile group
      1 trwiktionary abusefilter-profile v3
      1 trwiktionary captcha
      1 tumwiki captcha
      1 ukwiki abusefilter-profile v3
      1 ukwiki captcha
      1 urwiki abusefilter profile 169
      1 urwiki captcha
      1 viwiki abusefilter-profile v3
      1 viwiki newcomer-tasks
      1 warwiki abusefilter-profile group
      1 wikidatawiki abusefilter profile 108
      1 wikidatawiki abusefilter profile 111
      1 wikidatawiki abusefilter profile 87
      1 wikidatawiki abusefilter profile 92
      1 wikidatawiki abusefilter throttle new user,page
      1 yowiki captcha
      1 zh_min_nanwikisource captcha
      1 zhwiki abusefilter profile 127
      1 zhwiki abusefilter profile 231
      1 zhwiki abusefilter profile 69
      1 zhwiki abusefilter throttle 253 user,ip
      1 zhwikivoyage abusefilter-profile v3
      1 zh_yuewiki abusefilter-profile v3
      2 atjwiki captcha
      2 azwiki abusefilter-profile v3
      2 bnwiki abusefilter-profile v3
      2 cawiki abusefilter-profile v3
      2 cswiktionary abusefilter-profile v3
      2 dewikibooks abusefilter-profile v3
      2 dewiktionary abusefilter-profile v3
      2 dewiktionary captcha
      2 eewiki captcha
      2 elwiki captcha
      2 enwiki abusefilter throttle 806 user
      2 enwikiquote abusefilter-profile v3
      2 enwikiquote captcha
      2 enwikisource abusefilter-profile v3
      2 frwiktionary abusefilter-profile v3
      2 global watchlist-recent-updates elwiki
      2 global watchlist-recent-updates glwiki
      2 global watchlist-recent-updates kowiki
      2 global watchlist-recent-updates svwiki
      2 hiwiki captcha
      2 huwiki editor-journey
      2 iewiki captcha
      2 incubatorwiki abusefilter-profile v3
      2 jawiki abusefilter-profile v3
      2 jawikisource captcha
      2 kkwiki abusefilter-profile v3
      2 ltwiki abusefilter-profile v3
      2 lvwiki captcha
      2 maiwiki abusefilter-profile v3
      2 mediawikiwiki abusefilter-profile v3
      2 metawiki captcha
      2 mrwiki abusefilter throttle 127 page
      2 mywiki captcha
      2 ruwikiquote abusefilter-profile v3
      2 ruwiktionary abusefilter-profile v3
      2 skwiki abusefilter-profile v3
      2 slwiki captcha
      2 sourceswiki captcha
      2 thwiki captcha
      2 uzwiki captcha
      2 vecwiki abusefilter-profile v3
      2 zhwikinews abusefilter-profile v3
      3 bgwiki abusefilter-profile v3
      3 bjnwiki captcha
      3 centralauth centralautologin-token
      3 dewiki abusefilter throttle 242 user
      3 etwiki captcha
      3 frwikiversity captcha
      3 global watchlist-recent-updates idwiki
      3 global watchlist-recent-updates metawiki
      3 global watchlist-recent-updates rowiki
      3 guwiki abusefilter-profile v3
      3 newiki abusefilter-profile v3
      3 plwiki abusefilter-profile v3
      3 specieswiki captcha
      3 trwiki abusefilter-profile v3
      3 ukwiki editor-journey
      4 centralauth api-token
      4 ckbwiki abusefilter-profile v3
      4 cswiki newcomer-tasks
      4 enwikivoyage abusefilter-profile v3
      4 eswiki abusefilter-profile v3
      4 global watchlist-recent-updates huwiki
      4 global watchlist-recent-updates nlwiki
      4 global watchlist-recent-updates ptwiki
      4 kowiki newcomer-tasks
      4 nlwiki captcha
      4 testwiki abusefilter-profile v3
      4 wikidatawiki abusefilter-profile v3
      4 wikidatawiki captcha
      5 cswiki abusefilter-profile v3
      5 dewiki abusefilter-profile v3
      5 enwiki abusefilter throttle 279 user,page
      5 enwiktionary abusefilter-profile v3
      5 euwiki captcha
      5 fawiki captcha
      5 frwiktionary captcha
      5 global watchlist-recent-updates plwiki
      5 global watchlist-recent-updates ukwiki
      5 global watchlist-recent-updates zhwiki
      5 huwiki abusefilter-profile v3
      5 kowiki editor-journey
      5 shwiki captcha
      5 svwiki abusefilter-profile v3
      6 commonswiki abusefilter-profile v3
      6 fiwiki abusefilter-profile v3
      6 fiwiki captcha
      6 global watchlist-recent-updates ruwiki
      6 global watchlist-recent-updates wikidatawiki
      6 hewiki abusefilter-profile v3
      6 metawiki abusefilter-profile v3
      6 rowiki abusefilter-profile v3
      6 ruwiki abusefilter-profile v3
      6 viwiki captcha
      7 ptwiki abusefilter-profile v3
      7 simplewiki abusefilter-profile v3
      7 viwiki editor-journey
      7 zhwiki abusefilter-profile v3
      8 arwiki abusefilter-profile v3
      8 arwiki captcha
      8 enwiki abusefilter-profile v3
      8 enwikivoyage captcha
      8 global watchlist-recent-updates eswiki
      8 global watchlist-recent-updates hewiki
      8 mrwiki abusefilter-profile v3
      8 trwiki captcha
      9 frwiki newcomer-tasks
      9 thwiki abusefilter-profile v3
     10 fawiki abusefilter-profile v3
     10 frwiki abusefilter-profile v3
     10 itwiki abusefilter-profile v3
     10 simplewiktionary captcha
     11 arwiki newcomer-tasks
     11 global watchlist-recent-updates jawiki
     11 itwiki captcha
     11 jawiki captcha
     12 global watchlist-recent-updates commonswiki
     13 ruwiki captcha
     14 global watchlist-recent-updates itwiki
     15 arwiki abusefilter throttle 175 user
     16 global watchlist-recent-updates frwiki
     16 zhwiki captcha
     17 mediawikiwiki captcha
     20 plwiki captcha
     21 enwikiversity captcha
     21 enwiktionary captcha
     21 idwiki captcha
     23 arwiki editor-journey
     30 simplewiki captcha
     33 global watchlist-recent-updates dewiki
     34 arwiki abusefilter throttle 175 ip
     37 eswiki captcha
     60 ptwiki captcha
     65 mrwiki abusefilter throttle 9 user,ip
     75 testwiki ResourceLoaderModule-dependencies
     89 cswiki abusefilter throttle new ip,page
     89 global watchlist-recent-updates enwiki
    101 cswiki abusefilter throttle new user,page
    111 global loginnotify known
    180 global loginnotify new
    204 frwiki captcha
    277 dewiki captcha
    449 mediawikiwiki ResourceLoaderModule-dependencies
    585 commonswiki captcha
    680 enwiki captcha
    792 metawiki centralnotice bannerfields

@Krinkle do you know what kind of impact users can have if the above keys would disappear? (to be recreated later on in other shards).

Side note: This question is also interesting from a DC switchover perspective (T243316) since that will also effectively be a Redis flush. In previous switchovers we only explicitly handled replication for sessions data, and now that's out of Redis. If there's anything else in there that we can't afford to drop and recreate, now would be a great time to know that.

elukey added a subscriber: aaron.Fri, Jul 10, 8:47 AM

@Krinkle @aaron if you have time, let's follow up on the question that I asked about what happens if a Redis shard disappears. It would be really nice to start testing a new version of memcached this quarter :)

Another idea to add in here - recently John and Moritz needed TLS for memcached and imported memcached 1.6.6 (latest upstream) into out buster repositories. We could think about moving directly to 1.6.6 for some reasons:

  1. Native TLS support
  2. exstore https://memcached.org/blog/nvm-caching/

Point 2) is very interesting for two long term reasons:

  1. If possible, adding NVMe to the gutter pool could be an easy way to sustain more load than 256G of ram (assuming no saturation of the 10G NIC of course)
  2. When we'll need to refresh mc*, it might be possible to think about shrinking the current 18 shards into something smaller, leveraging extstore and NVMe.

Not saying that we have to do it, but just to think about it :)

Krinkle added a comment.EditedFri, Jul 10, 5:40 PM

CentralAuth and ChronologyProtector are both still high-profile consumers of main stash. Both are scheduled for migration, but currently only with relatively weak organisation pressure through being a blocker for Multi-DC.

Before I can speculate about the user-impact, I'd first need to know about the technical impact. What does it mean for, how Redis works right now, that there is one less server. Does everything re-hash, or only the keys from that server? How big a percentage is it?

Would it be possible to instead replace the memc part of it with a spare server instead? E.g. it would become a redis-only server (I suspect not feasible, but I can ask!)

CentralAuth and ChronologyProtector are both still high-profile consumers of main stash. Both are scheduled for migration, but currently only with relatively weak organisation pressure through being a blocker for Multi-DC.

Before I can speculate about the user-impact, I'd first need to know about the technical impact. What does it mean for, how Redis works right now, that there is one less server. Does everything re-hash, or only the keys from that server? How big a percentage is it?

Nutcracker should follow consistent hashing, so re-hash only the keys of mc1036 (should be 1/18th of the total) to other shards. The list of keys is the one above, that following consistent hashing should be around 1/18th of the all set of keys (since we have 18 shards).

Would it be possible to instead replace the memc part of it with a spare server instead? E.g. it would become a redis-only server (I suspect not feasible, but I can ask!)

In theory the long term plan is to have all the mc shards being memcached only, so I'd prefer not to use another server for this (that with the same spec we probably don't have among the spares etc..) :)