WARNING: For general safety, please keep children away from this Task. The //blasphemy-o-meter// installed in the IT department is going off the expected scale.
Good morning everyone!
What a lovely day to catch fire! 馃寛
The server `fabula.wikimedia.it` is offline due to a major incident in the datacenter `SBG` in Strasbourg owned by our service provider. For instance the datacenter `SBG2` has caught fire and was destroyed. Literally.
* http://travaux.ovh.net/?do=details&id=49484 - ticket about `SBG`
* http://travaux.ovh.net/?do=details&id=49471 - ticket about `SBG1`
* https://twitter.com/olesovhcom/status/1369478732247932929 - official tweet
* https://twitter.com/arthfl/status/1369604089034735620 (!)
This Task tracks info about the upstream outage and our Disaster Recovery. Spoiler: there will be no automagic recovery today.
The good part is that we should have a lot of cute snapshots safely stored in the datacenter `SBG3` (which was not damaged) but at the time of writing we have not any possibility to migrate these snapshots from that datacenter to another one because everything is freezed.
[X] `2021-03-10 09:00 CET` requested restore of the latest snapshot in another datacenter (`DE1`)
[X] generic response from our service provider
[X] `2021-03-31 19:46 CET` info from our service provider about our server `fabula`
[X] `2021-03-31 22:30 CET` migrating latest snapshot of our server `fabula` in another datacenter (`SBG7`)
[X] `2021-04-01 07:40 CET` experiencing issues in the latest snapshot, start troubleshooting in emergency mode
[ ] `no ETA ` ~~restore incoming traffic to `fabula` (DNS rollback)~~ - this will not be done, we will just use `intreccio` for production
[ ] `2021-04-01 09:00 CET` restored cinquex1000
[X] `2021-03-10 09:36 CET` creation of another fallback VM (called `intreccio`) in another datacenter (thanks to @Nemo_bis)
[X] `2021-03-10 09:44 CET` start pushing last backups in our hands into `intreccio` and VM preparation
[X] `2021-03-10 10:28 CET` start migrating traffic of https://www.wikimedia.it/ to be served from `intreccio` (thanks to @M7)
[X] `2021-03-10 10:56 CET` make https://www.wikimedia.it/ up 'n' running again thanks of an off-site backup dated `2021-03-02` made during T276206
[X] `2021-03-10 11:22 CET` setup a generic captch-all failure notice in `intreccio`
[ ] `no ETA ` fix or reinstall other services into `intreccio`