Page MenuHomePhabricator

Switch ChronologyProtector from redis to memcached
Closed, ResolvedPublic

Description

This will save a ~0.5ms Redis connection on most requests (those that write to FileRepo will still use the redis lock manager).

Event Timeline

Krinkle triaged this task as Medium priority.EditedAug 3 2022, 5:50 AM
Krinkle subscribed.

Afaik Redis lock manager uses the rdb* hosts instead (part of dedicated "redis_misc" cluster; as opposed to redis_sessions which is colocated on memc hosts).

See also:

I believe (but let's emperically confrim after this is done) that if we move ChronologyProtector to memc, then redis_sessions can be decom'ed as-is and thus remove the need to for the migration at T280586: Move "redis_sessions" to "redis_misc" cluster and also allows removal of nutcracker as afaik rdb/lockmanager doesn't use that.

Change 824556 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[operations/mediawiki-config@master] Switch $wgChronologyProtectorStash to "mcrouter"

https://gerrit.wikimedia.org/r/824556

Logstash mwdebug dashboard, query (channel:DBReplication AND ChronologyProtector) OR channel:redis OR channel:memcached

Before
  [memcached] MainWANObjectCache using store MemcachedPeclBagOStuff
  [memcached] MemcachedPeclBagOStuff::initializeClient: initializing new client instance.
  [memcached] MemcachedPeclBagOStuff debug: getMulti(WANCache:global:rdbms-server-states:1:db1157:0-1-2-3-4-5|#|v)
- [DBReplication] ChronologyProtector using store RedisBagOStuff
- [redis] RedisBagOStuff debug: get(global:Wikimedia\Rdbms\ChronologyProtector:…:v2) on /var/run/nutcracker/redis_eqiad.sock: success
  [DBReplication] Wikimedia\Rdbms\ChronologyProtector::applySessionReplicationPosition: DEFAULT (db1157) has no position
After
  [memcached] MainWANObjectCache using store MemcachedPeclBagOStuff
  [memcached] MemcachedPeclBagOStuff::initializeClient: initializing new client instance.
  [memcached] MemcachedPeclBagOStuff debug: getMulti(WANCache:global:rdbms-server-states:1:db1157:0-1-2-3-4-5|#|v)
+ [DBReplication] ChronologyProtector using store MemcachedPeclBagOStuff
+ [memcached] MemcachedPeclBagOStuff debug: get(global:Wikimedia\Rdbms\ChronologyProtector:…:v2)
+ [memcached] MemcachedPeclBagOStuff debug: result: NOT FOUND
  [DBReplication]  Wikimedia\Rdbms\ChronologyProtector::applySessionReplicationPosition: DEFAULT (db1157) has no position

[…] the last consumer is gone

Screenshot 2022-08-19 at 13.57.05.png (1×2 px, 300 KB)

Details and references to the various tasks that made this possible can be found at: https://wikitech.wikimedia.org/wiki/Redis#Cluster_redis_sessions

As expected, there is now an increase in "get" misses on Grafana: Memcache dashboard due to the unresolved T314434: Avoid ChronologyProtector queries on majory of pageviews that have no recent positions.

Screenshot 2022-08-19 at 14.00.55.png (534×618 px, 110 KB)

Change 824556 merged by jenkins-bot:

[operations/mediawiki-config@master] Switch $wgChronologyProtectorStash to "mcrouter"

https://gerrit.wikimedia.org/r/824556

Krinkle claimed this task.