Page MenuHomePhabricator

Rollout use of mcrouter for MediaWiki in production
Closed, ResolvedPublic

Description

In order to finish the MediaWiki config side of the mcrouter deploy, a series of steps will need to be done in stages. As I planned it, each step will involve a day or so before the next one.

Steps

  • Direct cache writes to both nutcracker and mcrouter for mediawiki.org (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440469/); wait 1 day
  • Direct cache writes to both nutcracker and mcrouter for all wikis (https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440470/); wait 1 day (this will double cache writes and space usage until this multi-write stage is over)
  • Switch cache reads to mcrouter on testwiki/mediawiki.org; wait 3 days
  • Switch cache reads to mcrouter on all wikis; wait 1 week
  • Remove nutcracker from cache write operations. This is the point where rollback is trickier, requiring either a restart of cache servers or relying on purgeChangedFiles.php/purgeChangedPages.php.; wait 1 day
  • Enable prefix-based wildcard purges for mcrouter for testwikis/mw.org. Any rollback to nutcracker would need to revert this too. nutcracker does not understand wildcard purges (it would just literally purge the keys with those names, which would make them not purge anything basically).; wait 2 days
  • Enable prefix-based wildcard purges for mcrouter for all wikis.

Monitor

  • Relevant logstash channels: (ObjectCache, memcached, mediawiki-errors aggregate channel)
  • Relevant grafana dashboards: https://grafana.wikimedia.org/dashboard/db/prometheus-memcached-dc-stats?orgId=1
  • Graphite: "MediaWiki.wanobjectcache.*.hit.good.rate" and "MediaWiki.wanobjectcache.*.miss.compute.rate" should look sane during all steps (since warmup is involved)
  • Other things to watch: performance.wikimedia.org graphs

Known issues and patches:

Other patches:

Details

ProjectBranchLines +/-Subject
operations/mediawiki-configmaster+1 -2
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -32
operations/mediawiki-configmaster+1 -3
operations/mediawiki-configmaster+3 -1
operations/mediawiki-configmaster+9 -10
mediawiki/corewmf/1.32.0-wmf.14+56 -11
mediawiki/coremaster+56 -11
mediawiki/corewmf/1.32.0-wmf.13+19 -2
mediawiki/coremaster+19 -2
mediawiki/corewmf/1.32.0-wmf.12+34 -2
mediawiki/coremaster+34 -2
mediawiki/coremaster+1 -1
operations/mediawiki-configmaster+2 -6
operations/mediawiki-configmaster+2 -6
integration/configmaster+1 -2
operations/mediawiki-configmaster+28 -7
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle renamed this task from Production MediaWiki mcrouter use rollout to Rollout use of mcrouter for MediaWiki in production.Jun 27 2018, 1:52 AM

Change 440469 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[operations/mediawiki-config@master] Make mediawiki.org write to both nutcracker and mcrouter

https://gerrit.wikimedia.org/r/440469

+1 to the overall plan; I'd like to see dates attached to the various steps now, so that we can have a clear schedule.

+1 to the overall plan; I'd like to see dates attached to the various steps now, so that we can have a clear schedule.

I want to know that T197450 is not related first. With that out of the way, I can set some SWAT dates.

Change 440469 merged by jenkins-bot:
[operations/mediawiki-config@master] Make test wikis just write to both nutcracker and mcrouter

https://gerrit.wikimedia.org/r/440469

Change 440469 merged by jenkins-bot:
[operations/mediawiki-config@master] Make test wikis just write to both nutcracker and mcrouter

https://gerrit.wikimedia.org/r/440469

Followed shortly by https://gerrit.wikimedia.org/r/443970 for mw.org

Change 440470 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Make all non-test wikis write to both nutcracker and mcrouter

https://gerrit.wikimedia.org/r/440470

Change 443977 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate BreadCrumbs extension to Quibble

https://gerrit.wikimedia.org/r/443977

Change 443977 merged by jenkins-bot:
[integration/config@master] Migrate BreadCrumbs extension to Quibble

https://gerrit.wikimedia.org/r/443977

Change 440470 merged by jenkins-bot:
[operations/mediawiki-config@master] Make all non-test wikis write to both nutcracker and mcrouter

https://gerrit.wikimedia.org/r/440470

Change 444932 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Make all non-test wikis write to both nutcracker and mcrouter again

https://gerrit.wikimedia.org/r/444932

Change 444932 merged by jenkins-bot:
[operations/mediawiki-config@master] Make all non-test wikis write to both nutcracker and mcrouter again

https://gerrit.wikimedia.org/r/444932

Mentioned in SAL (#wikimedia-operations) [2018-07-11T20:57:51Z] <krinkle@deploy1001> Synchronized wmf-config/mc.php: Ifa659de6453 - Revert multi-write mcrouter for most wikis - T198239 (duration: 00m 58s)

Change 445314 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: make BagOStuff::mergeViaLock() timeout more sensible

https://gerrit.wikimedia.org/r/445314

Change 445012 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: minor fix to MultiWriteBagOStuff::doWrite()

https://gerrit.wikimedia.org/r/445012

Change 445040 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] [WIP] Make MultiWriteBagOStuff use the native merge() of each backend

https://gerrit.wikimedia.org/r/445040

Change 445427 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: improve logging and error handling in BagOStuff

https://gerrit.wikimedia.org/r/445427

Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)

Change 445012 merged by jenkins-bot:
[mediawiki/core@master] objectcache: minor fix to MultiWriteBagOStuff::doWrite()

https://gerrit.wikimedia.org/r/445012

Change 445314 merged by jenkins-bot:
[mediawiki/core@master] objectcache: make BagOStuff::mergeViaLock() timeout more sensible

https://gerrit.wikimedia.org/r/445314

Change 446342 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@wmf/1.32.0-wmf.12] objectcache: make BagOStuff::mergeViaLock() timeout more sensible

https://gerrit.wikimedia.org/r/446342

Change 446342 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.12] objectcache: make BagOStuff::mergeViaLock() timeout more sensible

https://gerrit.wikimedia.org/r/446342

Change 445040 merged by jenkins-bot:
[mediawiki/core@master] Make MultiWriteBagOStuff use the native merge() of each backend

https://gerrit.wikimedia.org/r/445040

Change 446763 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@wmf/1.32.0-wmf.13] Make MultiWriteBagOStuff use the native merge() of each backend

https://gerrit.wikimedia.org/r/446763

Change 446763 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.13] Make MultiWriteBagOStuff use the native merge() of each backend

https://gerrit.wikimedia.org/r/446763

Change 445427 merged by jenkins-bot:
[mediawiki/core@master] objectcache: improve logging and error handling in BagOStuff

https://gerrit.wikimedia.org/r/445427

Change 447819 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Revert "Revert "Make all non-test wikis write to both nutcracker and mcrouter again""

https://gerrit.wikimedia.org/r/447819

Change 448172 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@wmf/1.32.0-wmf.14] objectcache: improve logging and error handling in BagOStuff

https://gerrit.wikimedia.org/r/448172

Change 448172 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.14] objectcache: improve logging and error handling in BagOStuff

https://gerrit.wikimedia.org/r/448172

Change 447819 merged by jenkins-bot:
[operations/mediawiki-config@master] Make all wikis write to both nutcracker and mcrouter (3)

https://gerrit.wikimedia.org/r/447819

Mentioned in SAL (#wikimedia-operations) [2018-07-30T18:25:51Z] <thcipriani@deploy1001> Synchronized wmf-config/mc.php: SWAT: [[gerrit:447819|Make all wikis write to both nutcracker and mcrouter (3)]] T198239 (duration: 00m 48s)

Change 449603 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Use mcrouter for cache reads for test wikis

https://gerrit.wikimedia.org/r/449603

Change 449604 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Use mcrouter for cache reads on all wikis

https://gerrit.wikimedia.org/r/449604

Change 449605 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Only do cache writes to mcrouter for all wikis

https://gerrit.wikimedia.org/r/449605

Change 449606 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Allow broadcasted mcrouter cache operations for purges

https://gerrit.wikimedia.org/r/449606

Change 449603 merged by jenkins-bot:
[operations/mediawiki-config@master] Use mcrouter for cache reads for test wikis

https://gerrit.wikimedia.org/r/449603

Krinkle updated the task description. (Show Details)

Change 449604 merged by jenkins-bot:
[operations/mediawiki-config@master] Use mcrouter for cache reads on all wikis

https://gerrit.wikimedia.org/r/449604

Change 449605 merged by jenkins-bot:
[operations/mediawiki-config@master] Only do cache writes to mcrouter for all wikis

https://gerrit.wikimedia.org/r/449605

Change 449606 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable broadcasted mcrouter cache operations for test wikis and mw.org

https://gerrit.wikimedia.org/r/449606

Change 452592 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[operations/mediawiki-config@master] Enable broadcasted mcrouter operations for all wikis

https://gerrit.wikimedia.org/r/452592

Change 452592 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable broadcasted mcrouter operations for all wikis

https://gerrit.wikimedia.org/r/452592

aaron updated the task description. (Show Details)