In order to finish the MediaWiki config side of the mcrouter deploy, a series of steps will need to be done in stages. As I planned it, each step will involve a day or so before the next one.
Steps
- Direct cache writes to both nutcracker and mcrouter for mediawiki.org (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440469/); wait 1 day
- Direct cache writes to both nutcracker and mcrouter for all wikis (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440470/); wait 1 day (this will double cache writes and space usage until this multi-write stage is over)
- Switch cache reads to mcrouter on testwiki/mediawiki.org; wait 3 days
- Switch cache reads to mcrouter on all wikis; wait 1 week
- Remove nutcracker from cache write operations. This is the point where rollback is trickier, requiring either a restart of cache servers or relying on purgeChangedFiles.php/purgeChangedPages.php.; wait 1 day
- Enable prefix-based wildcard purges for mcrouter for testwikis/mw.org. Any rollback to nutcracker would need to revert this too. nutcracker does not understand wildcard purges (it would just literally purge the keys with those names, which would make them not purge anything basically).; wait 2 days
- Enable prefix-based wildcard purges for mcrouter for all wikis.
Monitor
- Relevant logstash channels: (ObjectCache, memcached, mediawiki-errors aggregate channel)
- Relevant grafana dashboards: https://grafana.wikimedia.org/dashboard/db/prometheus-memcached-dc-stats?orgId=1
- Graphite: "MediaWiki.wanobjectcache.*.hit.good.rate" and "MediaWiki.wanobjectcache.*.miss.compute.rate" should look sane during all steps (since warmup is involved)
- Other things to watch: performance.wikimedia.org graphs
Known issues and patches:
- makeKey() using the fallback encoding instead of the encoding.
- Task: T198279: Exception "Key contains invalid characters" from MemcachedBagOStuff.php
- Patch for MultiWriteBagOStuff: (unaffected, was already correct)
- Patch for ReplicatedBagOStuff: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/442245/ (merged)
wmf/1.32.0-wmf.10: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/442326/ (deployed)- wmf/1.32.0-wmf.14: Naturally.
- add() expecting exclusive action to succeed on multiple backends
- Task: T198280: Beta Cluster: Unable to obtain lock via objectcache (memcached add() fails)
- Patch for MultiWriteBagOStuff: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/442813/ (merged)
wmf/1.32.0-wmf.10: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445005/ (deployed)- wmf/1.32.0-wmf.14: Naturally.
- Patch for ReplicatedBagOStuff: (Not needed.)
- merge() using mergeViaLock() instead of mergeViaCas()
- Patch for MultiWriteBagOStuff: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445040/ (merged)
- wmf/1.32.0-wmf.13: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/446763/ (merged)
- wmf/1.32.0-wmf.14: Naturally.
- Patch for ReplicatedBagOStuff: Unaffected.
- Patch for MultiWriteBagOStuff: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445040/ (merged)
- makeKeyInternal() using the fallback instead.
- Patch for MultiWriteBagOStuff and ReplicatedBagOStuff: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/444868/ (merged)
wmf/1.32.0-wmf.12: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445579/ (deployed)- wmf/1.32.0-wmf.14: Naturally.
- Patch for MultiWriteBagOStuff and ReplicatedBagOStuff: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/444868/ (merged)
Other patches:
- minor fix to MultiWriteBagOStuff::doWrite. – https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445012/ (merged)
- wmf/1.32.0-wmf.14: Naturally.
- make BagOStuff::mergeViaLock() timeout more sensible. – https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445314/ (merged)
- wmf/1.32.0-wmf.14: Naturally.
- improve logging. https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445427/ (merged)
- wmf/1.32.0-wmf.14: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/448172/ (deployed)