Practice Galera disaster recovert
Let's confirm alerting and disaster recovery are working properly with the Galera setup.

Proposed steps:

  • Delete one of the databases (glance?) on cloudcontrol2001-dev
  • Stop galera on cloudcontrol2001, 2003, 2004
  • Confirm that some sensible alerts are showing up on icinga
  • Recover!

Rudimentary docs can be found at and on

aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

We seem to have 2 related services: mysql and mariadb. It was confusing to know which one was the actual galera-enabled database. The mariadb service is full of errors and wont start apparently. I suggest we drop it, if we can. It can produce systemd icinga alerts in the future.

Mentioned in SAL (#wikimedia-cloud) [2020-07-03T11:36:21Z] <arturo> [codfw1dev] dropped glance database in the galera cluster T256283

Mentioned in SAL (#wikimedia-cloud) [2020-07-03T11:39:12Z] <arturo> [codfw1dev] stopped mysql database in the galera cluster T256283

After dropping the database and stopping the mysql service I don't see any mention in icinga about the state of the openstack system being wrong.

But obviously the API is returning HTTP/500:

root@cloudcontrol2001-dev:~# openstack endpoint list
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-073731fd-8af0-4032-8e5e-4dc7eec229ef)

Mentioned in SAL (#wikimedia-cloud) [2020-07-03T11:44:40Z] <arturo> [codfw1dev] restoring glance database backup from bacula into cloudcontrol2001-dev (T256283)

Trying to import the database from the backup, this was unexpected:

root@cloudcontrol2001-dev:~# mysqlimport glance /var/tmp/bacula-restores/srv/backups/glance-202007030408.sql -u root
mysqlimport: Error: 2002 Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)

We should note in the docs which particular command to use to reimport the database backup in this galera setup.

Ok, couple of things:

root@cloudcontrol2001-dev:~# mysql -u root glance < /var/tmp/bacula-restores/srv/backups/glance.sql
  • after all this the data was loaded and openstack is happy again:
root@cloudcontrol2004-dev:~# openstack image list
| ID                                   | Name                                        | Status |
| d1b2ea32-10ca-40a5-a3fc-babc3956f049 | debian-10.0-buster                          | active |
| f7f9a861-c227-4ea5-927b-571f11538d86 | debian-10.2.0-raw-upstream                  | active |
| 21dd4e70-487d-4c2a-9813-5b6997fae03e | debian-10.3-buster-upstream                 | active |
| cb24cf99-b77b-432e-a861-b5ff5fef95a0 | debian-9.11-stretch                         | active |
| 94321599-0b42-4f6f-8a80-67a2ff561870 | debian-9.11-stretch (deprecated 2019-12-18) | active |
| 23d2421f-43ab-4307-8e99-aaaaabc67d02 | debian-9.8-stretch (deprecated 2019-12-18)  | active |
| 1ab9141c-c713-4265-bd00-fab3e59aab69 | debian-buster (deprecated 2019-12-18)       | active |

Mentioned in SAL (#wikimedia-cloud) [2020-07-03T12:51:57Z] <arturo> [codfw1dev] galera cluster should be up and running, openstack happy (T256283)

