Page MenuHomePhabricator

🧭 Northward Datacentre Switchover (March 2025)
Closed, ResolvedPublic

Description

Important Dates:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-disable-puppet for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:02.492235

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:20.062482

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-optional-warmup-caches for datacenter switchover from eqiad to codfw - finished with status: FAILURE elapsed time: 0:00:12.516407

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:05:45.662908

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:15.506655

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from eqiad to codfw - [DRY-RUN] MediaWiki read-only period starts at: 2025-02-27 17:34:09.402528

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:15.227370

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.03-set-db-readonly for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:35.394616

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.04-switch-mediawiki for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:21.056334

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.06-set-db-readwrite for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:02.428644

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from eqiad to codfw - [DRY-RUN] MediaWiki read-only period ends at: 2025-02-27 17:36:42.297422

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:05.756227

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:32.887205

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:02:24.990807

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:40.452280

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:11:19.318929

Change #1126090 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/cookbooks@master] switchdc: stop and restart crons as part of swithover process

https://gerrit.wikimedia.org/r/1126090

Change #1127067 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/dns@master] wmnet: update CNAME records for DB masters to eqiad

https://gerrit.wikimedia.org/r/1127067

Change #1127069 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/dns@master] geo-maps: update map default to list eqiad first

https://gerrit.wikimedia.org/r/1127069

Change #1127068 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/dns@master] wmnet: update CNAME record for maintenance host to eqiad

https://gerrit.wikimedia.org/r/1127068

Change #1127072 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/mediawiki-config@master] debug: reorder debug backends for eqiad switchover

https://gerrit.wikimedia.org/r/1127072

Change #1127073 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/dns@master] wmnet: point deploy server at eqiad

https://gerrit.wikimedia.org/r/1127073

Change #1127074 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] deployment: switch deploy servers to eqiad

https://gerrit.wikimedia.org/r/1127074

Change #1126090 merged by jenkins-bot:

[operations/cookbooks@master] switchdc: stop and restart crons as part of switchover process

https://gerrit.wikimedia.org/r/1126090

Change #1127859 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] mw-(web|api-ext): scale up in anticipation of switchover

https://gerrit.wikimedia.org/r/1127859

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw - finished with status: FAILURE elapsed time: 0:00:15.600535

Change #1127878 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/cookbooks@master] switchdc: delete Job objects for mw-cron due to library support

https://gerrit.wikimedia.org/r/1127878

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:00:15.654838

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from eqiad to codfw - finished with status: SUCCESS elapsed time: 0:02:30.880468

Change #1127859 merged by jenkins-bot:

[operations/deployment-charts@master] mw-(web|api-ext): scale up in anticipation of switchover

https://gerrit.wikimedia.org/r/1127859

Change #1127878 merged by jenkins-bot:

[operations/cookbooks@master] switchdc: delete Job objects for mw-cron due to library support

https://gerrit.wikimedia.org/r/1127878

hnowlan@cumin2002 - Cookbook cookbooks.sre.discovery.datacenter depool all services in codfw: Datacenter Switchover - T385155 started.

Mentioned in SAL (#wikimedia-operations) [2025-03-18T15:05:01Z] <hnowlan@cumin2002> START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Switchover - T385155

Change #1128895 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/cookbooks@master] switchdc: clarify inputs for moving active/passive services

https://gerrit.wikimedia.org/r/1128895

hnowlan@cumin2002 - Cookbook cookbooks.sre.discovery.datacenter depool all services in codfw: Datacenter Switchover - T385155 completed.

Mentioned in SAL (#wikimedia-operations) [2025-03-18T15:34:44Z] <hnowlan@cumin2002> END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Switchover - T385155

Mentioned in SAL (#wikimedia-operations) [2025-03-19T13:48:44Z] <hnowlan@deploy2002> Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover - T385155

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-disable-puppet for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:02.515982

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:18.881572

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:05:49.912269

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from codfw to eqiad - finished with status: FAILURE elapsed time: 0:00:10.629439

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:26.733239

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from codfw to eqiad - MediaWiki read-only period starts at: 2025-03-19 14:15:30.955779

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:18.786734

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.03-set-db-readonly for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:34.506271

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.04-switch-mediawiki for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:49.388759

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.06-set-db-readwrite for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:03.122358

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from codfw to eqiad - MediaWiki read-only period ends at: 2025-03-19 14:17:55.451583

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:12.437502

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:30.255885

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:02:39.089447

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:00:39.885710

Change #1127067 merged by Hnowlan:

[operations/dns@master] wmnet: update CNAME records for DB masters to eqiad

https://gerrit.wikimedia.org/r/1127067

hnowlan@cumin2002 - Cookbook cookbooks.sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from codfw to eqiad - finished with status: SUCCESS elapsed time: 0:10:38.157629

Mentioned in SAL (#wikimedia-operations) [2025-03-19T14:41:24Z] <hnowlan@deploy2002> Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover - T385155 (duration: 52m 40s)

Change #1127068 merged by Hnowlan:

[operations/dns@master] wmnet: update CNAME record for maintenance host to eqiad

https://gerrit.wikimedia.org/r/1127068

Change #1127069 merged by Hnowlan:

[operations/dns@master] geo-maps: update map default to list eqiad first

https://gerrit.wikimedia.org/r/1127069

Change #1127072 merged by jenkins-bot:

[operations/mediawiki-config@master] debug: reorder debug backends for eqiad switchover

https://gerrit.wikimedia.org/r/1127072

Mentioned in SAL (#wikimedia-operations) [2025-03-19T15:18:04Z] <hnowlan@deploy2002> Started scap sync-world: Backport for [[gerrit:1127072|debug: reorder debug backends for eqiad switchover (T385155)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-19T15:23:33Z] <hnowlan@deploy2002> hnowlan: Backport for [[gerrit:1127072|debug: reorder debug backends for eqiad switchover (T385155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Change #1129296 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/mediawiki-config@master] debug: fix config syntax

https://gerrit.wikimedia.org/r/1129296

Change #1129296 merged by jenkins-bot:

[operations/mediawiki-config@master] debug: fix config syntax

https://gerrit.wikimedia.org/r/1129296

Mentioned in SAL (#wikimedia-operations) [2025-03-19T15:36:28Z] <hnowlan@deploy2002> Started scap sync-world: Backport for [[gerrit:1129296|debug: fix config syntax (T385155)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-19T15:41:33Z] <hnowlan@deploy2002> hnowlan: Backport for [[gerrit:1129296|debug: fix config syntax (T385155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-19T15:53:40Z] <hnowlan@deploy2002> Finished scap sync-world: Backport for [[gerrit:1129296|debug: fix config syntax (T385155)]] (duration: 17m 11s)

Change #1129945 had a related patch set uploaded (by Jasmine; author: Jasmine):

[operations/dns@master] wmnet: update deployment CNAME record to deploy1003

https://gerrit.wikimedia.org/r/1129945

Mentioned in SAL (#wikimedia-operations) [2025-03-20T21:49:45Z] <kamila@deploy2002> Locking from deployment [MediaWiki]: deployment server switch -- T385155

Change #1129945 abandoned by Jasmine:

[operations/dns@master] wmnet: update deployment CNAME record to deploy1003

Reason:

Change already created

https://gerrit.wikimedia.org/r/1129945

Change #1127073 merged by Kamila Součková:

[operations/dns@master] wmnet: point deploy server at eqiad

https://gerrit.wikimedia.org/r/1127073

Change #1129952 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/puppet@production] hieradata: update deployment_server to deploy1003

https://gerrit.wikimedia.org/r/1129952

Change #1129952 merged by Kamila Součková:

[operations/puppet@production] hieradata: update deployment_server to deploy1003

https://gerrit.wikimedia.org/r/1129952

Mentioned in SAL (#wikimedia-operations) [2025-03-20T22:58:16Z] <kamila@deploy2002> Unlocked for deployment [MediaWiki]: deployment server switch -- T385155 (duration: 68m 30s)

Mentioned in SAL (#wikimedia-operations) [2025-03-20T23:10:53Z] <kamila@deploy1003> Started scap sync-world: Test deployment to validate deployment server switchover - T385155

Mentioned in SAL (#wikimedia-operations) [2025-03-20T23:30:36Z] <kamila@deploy1003> Finished scap sync-world: Test deployment to validate deployment server switchover - T385155 (duration: 19m 42s)

Mentioned in SAL (#wikimedia-operations) [2025-03-26T14:06:24Z] <hnowlan@cumin1002> START - Cookbook sre.dns.admin DNS admin: pool site codfw [reason: Datacentre switchover repool, T385155]

Mentioned in SAL (#wikimedia-operations) [2025-03-26T14:06:40Z] <hnowlan@cumin1002> END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site codfw [reason: Datacentre switchover repool, T385155]

hnowlan@cumin1002 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in codfw: Datacentre switchover repool - T385155 started.

Mentioned in SAL (#wikimedia-operations) [2025-03-26T14:08:12Z] <hnowlan@cumin1002> START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: Datacentre switchover repool - T385155

hnowlan@cumin1002 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in codfw: Datacentre switchover repool - T385155 completed.

Mentioned in SAL (#wikimedia-operations) [2025-03-26T14:30:22Z] <hnowlan@cumin1002> END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: Datacentre switchover repool - T385155

Change #1128895 merged by jenkins-bot:

[operations/cookbooks@master] switchdc: clarify inputs for moving active/passive services

https://gerrit.wikimedia.org/r/1128895

Change #1127074 abandoned by Hnowlan:

[operations/puppet@production] deployment: switch deploy servers to eqiad

Reason:

Done in another patch

https://gerrit.wikimedia.org/r/1127074