In order to finish the MediaWiki config side of the mcrouter deploy, a series of steps will need to be done in stages. As I planned it, each step will involve a day or so before the next one.
- [x] Direct cache writes to both nutcracker and mcrouter for mediawiki.org (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440469/); wait 1 day
- [] Direct cache writes to both nutcracker and mcrouter for all wikis (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440470/); wait 1 day (this will double cache writes and space usage until this multi-write stage is over)
- [] Switch cache reads to mcrouter on testwiki/mediawiki.org; wait 3 days
- [] Switch cache reads to mcrouter on all wikis; wait 1 week
- [] Remove nutcracker from cache write operations. This is the point where rollback is trickier, requiring either a restart of cache servers or relying on purgeChangedFiles.php/purgeChangedPages.php.; wait 1 day
- [] Enable prefix-based wildcard purges for mcrouter for testwikis/mw.org. Any rollback to nutcracker would need to revert this too. nutcracker does not understand wildcard purges (it would just literally purge the keys with those names, which would make them not purge anything basically).; wait 2 days
- [] Enable prefix-based wildcard purges for mcrouter for all wikis.
Relevant logstash channels: (ObjectCache, memcached, mediawiki-errors aggregate channel)
Relevant grafana dashboards: https://grafana.wikimedia.org/dashboard/db/prometheus-memcached-dc-stats?orgId=1
Graphite: "MediaWiki.wanobjectcache.*.hit.good.rate" and "MediaWiki.wanobjectcache.*.miss.compute.rate" should look sane during all steps (since warmup is involved)
Other things to watch: performance.wikimedia.org graphs
### Known issues and patches:
* makeKey() using the fallback encoding instead of the encoding.
** Task: {T198279}
** Patch for MultiWriteBagOStuff: (unaffected, was already correct)
** Patch for ReplicatedBagOStuff: <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/442245/> (merged)
*** ~~wmf/1.32.0-wmf.10: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/442326/ (deployed)~~
*** wmf/1.32.0-wmf.12: Naturally.
*** wmf/1.32.0-wmf.13: Naturally.
* add() expecting exclusive action to succeed on multiple backends
** Task: {T198280}
** Patch for MultiWriteBagOStuff: <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/442813/> (merged)
*** ~~wmf/1.32.0-wmf.10: <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445005/> (deployed)~~
*** wmf/1.32.0-wmf.12: Naturally.
*** wmf/1.32.0-wmf.13: Naturally.
** Patch for ReplicatedBagOStuff: (not needed?)
* merge() using mergeViaLock() instead of mergeViaCas()
** Patch for MultiWriteBagOStuff: <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445040/> (**not merged**)
*** wmf/1.32.0-wmf.12: **TODO**
*** wmf/1.32.0-wmf.13: **TODO**
** Patch for ReplicatedBagOStuff: Unaffected.
* makeKeyInternal() using the fallback instead.
** Patch for MultiWriteBagOStuff and ReplicatedBagOStuff: <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/444868/> (merged)
*** ~~wmf/1.32.0-wmf.12: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445579/ (deployed)~~
*** wmf/1.32.0-wmf.13: Naturally.
### Other patches:
* minor fix to MultiWriteBagOStuff::doWrite. – <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445012/> (merged)
** wmf/1.32.0-wmf.12: **TODO**
** wmf/1.32.0-wmf.13: Naturally.
* make BagOStuff::mergeViaLock() timeout more sensible. – <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445314/> (merged)
** wmf/1.32.0-wmf.12: **TODO**
** wmf/1.32.0-wmf.13: Naturally.
* improve logging. <https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445427/>
** wmf/1.32.0-wmf.12: **TODO**
** wmf/1.32.0-wmf.13: **TODO**