Page MenuHomePhabricator

phased rollout of dbctl, etcd-backed database configuration in Mediawiki
Closed, ResolvedPublic

Description

Current revert instructions

See https://wikitech.wikimedia.org/wiki/Dbctl#Emergency_revert_to_static_configs

Background

dbctl is a tool based on conftool that stores Mediawiki's database loadbalancer configuration in etcd. This ticket tracks its rollout to WMF production.

Rollout planned to begin on Tuesday 30 July.
Rollout should last only one week, as for the duration of the rollout, DBAs will need to perform the extra toil of modifying the database configuration in both mediawiki-config as well as in dbctl.

Proposed rollout plan:

The week of 30 July and the week of 5 August are Americas Mediawiki train weeks, so on each day, this rollout will be timed to finish well before the train begins.

The set of hosts using dbctl is controlled by the array $dbctl_enabled_hosts in CommonSettings.php.
For the first two days of the rollout, the quickest way to revert will be to depool the small number of appservers involved.
If a revert is needed after Thursday, it would be wiser to perform a mediawiki-config change that removes entries from $dbctl_enabled_hosts instead.

Tuesday 30 July:
  • mwdebug1001
  • mwdebug*
  • mw1261 (a single canary appserver)
  • mw1276 (a single canary apiserver)
  • all canary appservers, apiservers, jobrunners
Wednesday:
  • 25% of appservers and apiservers and jobrunners
Thursday:
  • 100% of appservers and apiservers and jobrunners

Event Timeline

Change 525684 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] Initial canary of dbctl, db config from etcd

https://gerrit.wikimedia.org/r/525684

jijiki triaged this task as Medium priority.Jul 26 2019, 10:26 AM
jijiki edited projects, added serviceops-radar; removed serviceops.

Change 526393 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] dbctl: add missing instances

https://gerrit.wikimedia.org/r/526393

Change 526393 merged by Volans:
[operations/puppet@production] dbctl: add missing instances

https://gerrit.wikimedia.org/r/526393

Change 525684 merged by jenkins-bot:
[operations/mediawiki-config@master] Initial canary of dbctl, db config from etcd

https://gerrit.wikimedia.org/r/525684

Change 526437 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] dbctl: enable on mwdebug* and two canaries

https://gerrit.wikimedia.org/r/526437

Change 526437 merged by jenkins-bot:
[operations/mediawiki-config@master] dbctl: enable on mwdebug* and two canaries

https://gerrit.wikimedia.org/r/526437

Change 526468 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] dbctl: enable on all canaries

https://gerrit.wikimedia.org/r/526468

Change 526468 merged by jenkins-bot:
[operations/mediawiki-config@master] dbctl: enable on all canaries

https://gerrit.wikimedia.org/r/526468

Change 526607 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbctl: Add new instance

https://gerrit.wikimedia.org/r/526607

Change 526607 merged by Marostegui:
[operations/puppet@production] dbctl: Add new instance

https://gerrit.wikimedia.org/r/526607

Change 526669 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] dbctl: expand to 10% of appservers

https://gerrit.wikimedia.org/r/526669

Change 526669 merged by jenkins-bot:
[operations/mediawiki-config@master] dbctl: expand to 25% of appservers

https://gerrit.wikimedia.org/r/526669

Change 526735 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] dbctl: disable on half of canary hosts

https://gerrit.wikimedia.org/r/526735

Change 526735 merged by jenkins-bot:
[operations/mediawiki-config@master] dbctl: disable on half of canary hosts

https://gerrit.wikimedia.org/r/526735

Change 527104 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] dbctl: to 100%!

https://gerrit.wikimedia.org/r/527104

Change 527104 merged by jenkins-bot:
[operations/mediawiki-config@master] dbctl: to 100%!

https://gerrit.wikimedia.org/r/527104

Change 525684 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/mediawiki-config@master] Initial canary of dbctl, db config from etcd

https://gerrit.wikimedia.org/r/525684

From a perf perspective, this is the patch that matters most (as it adds an extra Etcd call to all web requests on all entry points). I was thinking this would be conditional as well, as otherwise I would've done the perf analysis earlier, but here goes.

It was deployed July 30 15:00 UTC.

So far so good :)

I've checked *.api.svgz as well for these dates, and there it's stable around 0.24% - 0.26% both before and after the change.

For *.index and *.all, the child node is consistently not sampled (too small).

CDanis updated the task description. (Show Details)

Change 527245 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] Revert "dbctl: diff PHP vs dbctl configs"

https://gerrit.wikimedia.org/r/527245

Change 527245 merged by CDanis:
[operations/puppet@production] Revert "dbctl: diff PHP vs dbctl configs"

https://gerrit.wikimedia.org/r/527245