Meta-task for live-test tracking and possible work to do afterwards.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Clement_Goubert | T327920 March 2023 Datacenter Switchover | |||
Resolved | Clement_Goubert | T330271 March 2023 Datacenter Switchover live test | |||
Resolved | Clement_Goubert | T330300 sre.switchdc.mediawiki.07-set-readwrite doesn't reset both datacenter to rw | |||
Resolved | Clement_Goubert | T330302 sre.switchdc.mediawiki.03-set-db-readonly fails in live-test mode | |||
Resolved | Marostegui | T330619 Enable DB replication codfw -> eqiad before the switchover |
Event Timeline
Comment Actions
10:35 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl 10:35 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
Comment Actions
Skipping 00-optional-warmup-caches as the node script is broken and the replacement python script hasn't been reviewed yet.
Comment Actions
11:01 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.00-disable-puppet 11:01 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
Comment Actions
11:01 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks 11:01 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
Comment Actions
11:02 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance 11:02 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
Comment Actions
11:03 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.02-set-readonly 11:03 <+logmsgbot> !log cgoubert@cumin1001 [DRY-RUN] MediaWiki read-only period starts at: 2023-02-22 11:03:19.149671 11:03 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) 11:03 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly 11:04 <+logmsgbot> !log cgoubert@cumin1001 END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
Comment Actions
spicerack.mysql_legacy.MysqlLegacyError: Unable to get heartbeat from master db1118.eqiad.wmnet for section s1
This comment was removed by Clement_Goubert.
Comment Actions
11:13 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki 11:13 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) 11:13 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite 11:13 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) 11:13 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.07-set-readwrite 11:13 <+logmsgbot> !log cgoubert@cumin1001 [DRY-RUN] MediaWiki read-only period ends at: 2023-02-22 11:13:51.466468 11:13 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) 11:14 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners 11:14 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0) 11:14 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.08-start-maintenance 11:16:26 +logmsgbot │ !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
Comment Actions
11:18 <+logmsgbot> !log cgoubert@cumin1001 START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters 11:24 <+logmsgbot> !log cgoubert@cumin1001 END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
Comment Actions
Incident doc relating the minor editing incident due to T330300: sre.switchdc.mediawiki.07-set-readwrite doesn't reset both datacenter to rw https://wikitech.wikimedia.org/wiki/Incidents/2023-02-22_read_only