Page MenuHomePhabricator

Migrate backup1-* replicas to MariaDB 10.6
Closed, ResolvedPublic

Description

Let¡s migrate the replicas from backup1-{eqiad,codfw} to MariaDB 10.6

  • db1205
  • db2184

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.

Change 888673 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1205,db2184: Migrate them to MariaDB 10.6.12

https://gerrit.wikimedia.org/r/888673

Change 888673 merged by Marostegui:

[operations/puppet@production] db1205,db2184: Migrate them to MariaDB 10.6.12

https://gerrit.wikimedia.org/r/888673

Mentioned in SAL (#wikimedia-operations) [2023-02-13T12:15:16Z] <marostegui> Upgrade db1205 and db2184 to mariadb 10.6.12 T329499

@jcrespo db1205 and db2184 have been migrated to 10.6.12 - could you check if all looks good from your side?

Data & service looks good, but let's wait until tonight's backup & check to confirm.

Yes, although sadly I didn't have the time to test the 10.6 recovery process, I intend to do it this week.

Let's wait for the recovery before closing then. Thanks

Recovery took ~1h30 with the script:

# mini_loader.sh dump.backup1-codfw.2023-02-28--03-45-08
Starting recovery at 2023-03-01 10:31:50+00:00
[...]
Finishing recovery at 2023-03-01 11:58:33+00:00

The total size recovered is:

root@db2184:/srv/sqldata$ du -hs
152G    .

I will now do a complete data check compared to the primary host and restart replication.

I intend to leave fully setup this recovery script as well as the the modified documentation recommending to use it for recoveries before this Friday.

Data check looks good:

mediawiki metadata table
2023-03-01T12:19:56.993934: row id 104990001/105492898, ETA: 00m02s, 0 chunk(s) found different
Execution ended, no differences found.
backups metadata table
root@db2183.codfw.wmnet[mediabackups]> SELECT count(*) FROM backups;
+-----------+
| count(*)  |
+-----------+
| 102602541 |
+-----------+
1 row in set (24.277 sec)

root@db2184.codfw.wmnet[mediabackups]> SELECT count(*) FROM backups;
+-----------+
| count(*)  |
+-----------+
| 102602541 |
+-----------+
1 row in set (34.382 sec)

(there 2 are the only ones with real data that cannot be lost)

Removing original copy and returning db2184 to production.

We can resolve this- and upgrade their respective primaries or proceed with the misc replicas.

Great news, I am going to create a ticket for the primaries.

Change 893803 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] Moving the working prototype/hack into production

https://gerrit.wikimedia.org/r/893803

Change 893803 merged by Jcrespo:

[operations/puppet@production] dbbackups: Moving the recovery working prototype/hack into production

https://gerrit.wikimedia.org/r/893803