Page MenuHomePhabricator

Warmup script does not warm memcached enough
Closed, ResolvedPublic

Description

When repooling eqiad read-only in T331541: 14 March 2023 eqiad Service repooling, memcached started timeouting and serving from gutter pool.
https://grafana.wikimedia.org/goto/NGzRyxaVz?orgId=1
https://grafana.wikimedia.org/goto/IJqgybaVk?orgId=1

We can see the spike of MySQL read when eqiad appservers got repooled, and decrease when memcached caught up:
https://grafana.wikimedia.org/goto/EiLgwxaVz?orgId=1

We need to check why the warmup cookbook sre.switchdc.mediawiki.00-optional-warmup-caches did not sufficiently warm up memcached.

Event Timeline

Clement_Goubert triaged this task as High priority.

I just wanted to mention that despite of the sudden spike on DB reads, our databases kept up just fine in general. We did have timeouts on some enwiki (s1) replicas, but it was mostly well handled.

Clement_Goubert lowered the priority of this task from High to Medium.Mar 14 2023, 2:14 PM

I think https://gerrit.wikimedia.org/r/c/operations/puppet/+/892570 would have smoothed this out, at least in part -- we just didn't get it reviewed in time for the switchover. We'll have to try it out on the next one instead. (Unless we stage a depool and cache wipe just to test the warmup script, but that's probably not necessary.)

892570 is merged now, and I think we'll be in better shape for the next one. @Clement_Goubert I'm tempted to resolve this, and reopen if we find we still have a problem the next time we have to do a warmup -- but your call.

Clement_Goubert claimed this task.

I'm good with that.