This is an umbrella ☂️ task for the upcoming Northward Switchover.
As of Sept 2023, switchovers take place at predictable dates; [[ https://wikitech.wikimedia.org/wiki/Switch_Datacenter/Recurring,_Equinox-based,_Data_Center_Switchovers | the work week of the Solar Equinox. ]]
Important Dates:
- **Services:** [[ https://zonestamp.toolforge.org/1710856800 | Tuesday, 19 March 2024 @14:00 UTC ]]
- **Traffic:** [[ https://zonestamp.toolforge.org/1710856800 | Tuesday, 19 March 2024 @14:00 UTC ]]
- **MediaWiki:** [[ https://zonestamp.toolforge.org/1710943200 | Wednesday, 20 March 2024 @14:00 UTC ]]
-**Deployment server**: Thursday, 21 March 2024
**Day 1 issues:**
* Kartotherian started running out of resources, so we had to repool kartotherian on codfw and restart the service on both datacentres
* Thumbor was using swift.discovery.wmnet, thus thumbor on codfw was attempting to access swift on eqiad using codfw's creds, causing tons of 401s.
* mw-on-k8s started working harder than usual, expected since we turned off multi-DC, we added some more resources just to be on the safe side. Specifically, we added 53 replicas to mw-web and 10 to mw-api-ext.
* Unfortunate coincidence where around the services switchover, changeprop was overwhelmed for unrelated reasons, causing jobs to pile up
**Day 2 issues:**
* While stopping all maintenance scripts (`01-stop-maintenance`), we found a user triggered script which we fiercely killed manually, and continued the process
**Day 3 issues:**
* We switched to deploy1002.eqiad.wmnet without any issues.