Page MenuHomePhabricator

Upgrade and rebuild s7
Closed, ResolvedPublic

Description

  • dbstore1008
  • db2222
  • db2221
  • db2220
  • db2218
  • db2208
  • db2200 backup source (Jaime will do it)
  • db2198 backup source (Jaime will do it)
  • db2187
  • db2182
  • db2168
  • db2159
  • db2150
  • db1236
  • db1227
  • db1202
  • db1194
  • db1191
  • db1181
  • db1174
  • db1171
  • db1170
  • db1158
  • db1155
  • clouddb1018
  • clouddb1014
  • an-redacteddb1001

Event Timeline

Marostegui claimed this task.
Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui updated the task description. (Show Details)

Icinga downtime and Alertmanager silence (ID=f10b64ca-2317-4be0-8c0c-8c35e9d6e6c3) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1236.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=5c4d1fa9-2ef4-47eb-b26c-fd569e00f9b7) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db2220.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=b270fdd1-ce8f-40a2-80b1-6a3bd47ff600) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1236.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=rLMAS8fd61b9d5e3b-19ac-42de-8036-6bced9ba4555) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1227.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=3be8e339-d92d-451f-afa7-bab775f89da2) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db2222.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=f9c4baf3-3bd9-4341-abde-76a3bd605895) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db2221.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=17c1b9b4-eac9-486a-b8dd-749dd8fcb051) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1202.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2025-02-06T06:58:00Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2208 db1194 T385550', diff saved to https://phabricator.wikimedia.org/P73275 and previous config saved to /var/cache/conftool/dbconfig/20250206-065759-marostegui.json

Icinga downtime and Alertmanager silence (ID=37147450-48c5-4574-8497-d49abab5ac37) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db2208.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=29e48ba7-187c-480c-bdd7-9508f1fc6874) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1194.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2025-02-06T12:57:14Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2159 db1191 T385550', diff saved to https://phabricator.wikimedia.org/P73312 and previous config saved to /var/cache/conftool/dbconfig/20250206-125713-marostegui.json

Icinga downtime and Alertmanager silence (ID=8fc176d3-1ac2-44d1-8d2d-d3d7172243ed) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1191.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=88f2b604-9411-4170-a7ce-6b73465e5c27) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db2159.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=366e04db-d729-476f-b441-5fc93f056a2c) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db2150.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=e74f6823-c7ba-4085-9cab-3396df3d7e6f) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1174.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=f9845655-7483-4179-a6fb-a5ed88d6308a) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1170.eqiad.wmnet
Marostegui added a subscriber: jcrespo.

@jcrespo I will leave db2200 and db2198 for you as we spoke past week (no rush)

Icinga downtime and Alertmanager silence (ID=2f80616d-811c-4cec-9bb9-4be68c5eed42) set by jynus@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: upgrade and rebuild tables

db2200.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2025-02-21T07:22:39Z] <jynus> rebuilding tables for db2200 T385550

Icinga downtime and Alertmanager silence (ID=f86e9ddb-675e-4f4c-bac3-2f568013928a) set by jynus@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: upgrade and rebuild tables

db2198.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=0eb5c53d-4cad-442c-9972-328355758ff2) set by jynus@cumin1002 for 4 days, 0:00:00 on 2 host(s) and their services with reason: Table rebuilding ongoing

db[2198,2200].codfw.wmnet

Once these 2 finish (technically already upgraded), only db2199 will be missing to upgrade from 10.6.17 to 10.6.20 of the backup sources (not s7).

jcrespo updated the task description. (Show Details)

Backup sources are done, reassigning to @Marostegui for him to proceed or close (only db2199 is missing 10.6.20, which I will upgrade now, but that is out of scope of this ticket).

All innodb tables were rebuilt:

1cumin2024@db1164.eqiad.wmnet[dbbackups]> SELECT A.file_path, A.file_name, A.size AS size_job_1, B.size AS size_job_2, ABS(CONVERT(B.size, SIGNED) - CONVERT(A.size, SIGNED)) AS difference FROM backup_files A JOIN backup_files B ON A.file_path = B.file_path AND A.file_name = B.file_name WHERE A.backup_id = 29091 AND B.backup_id = 29094 ORDER BY ABS(CONVERT(B.size, SIGNED) - CONVERT(A.size, SIGNED)) DESC;
2
3+--------------------+-------------------------------------------------------+-------------+--------------+--------------+
4| file_path | file_name | size_job_1 | size_job_2 | difference |
5+--------------------+-------------------------------------------------------+-------------+--------------+--------------+
6| | ibdata1 | 5546967040 | 227310305280 | 221763338240 |
7| itwiki | change_tag.ibd | 7675576320 | 3829399552 | 3846176768 |
8| zhwiki | change_tag.ibd | 6157238272 | 3120562176 | 3036676096 |
9| ptwiki | change_tag.ibd | 3837788160 | 1937768448 | 1900019712 |
10| plwiki | change_tag.ibd | 2797600768 | 1392508928 | 1405091840 |
11| idwiki | change_tag.ibd | 2835349504 | 1451229184 | 1384120320 |
12| enwiktionary | categorylinks.ibd | 8363442176 | 9747562496 | 1384120320 |
13| trwiki | change_tag.ibd | 2554331136 | 1279262720 | 1275068416 |
14| nlwiki | change_tag.ibd | 2126512128 | 1073741824 | 1052770304 |
15| svwiki | categorylinks.ibd | 2604662784 | 3565158400 | 960495616 |
16| itwiki | categorylinks.ibd | 4848615424 | 5809111040 | 960495616 |
17| enwiktionary | change_tag.ibd | 1807745024 | 922746880 | 884998144 |
18| nlwiki | categorylinks.ibd | 1656750080 | 2411724800 | 754974720 |
19| zhwiki | categorylinks.ibd | 3938451456 | 4638900224 | 700448768 |
20| svwiki | change_tag.ibd | 1358954496 | 683671552 | 675282944 |
21| ptwiki | text.ibd | 4932501504 | 4286578688 | 645922816 |
22| cswiki | change_tag.ibd | 1266679808 | 633339904 | 633339904 |
23| nowiki | categorylinks.ibd | 1191182336 | 1728053248 | 536870912 |
24| svwiki | templatelinks.ibd | 4852809728 | 4324327424 | 528482304 |
25| idwiki | categorylinks.ibd | 2063597568 | 2583691264 | 520093696 |
26| ptwiki | categorylinks.ibd | 2608857088 | 3112173568 | 503316480 |
27| thwiki | change_tag.ibd | 964689920 | 482344960 | 482344960 |
28| fiwiki | change_tag.ibd | 843055104 | 423624704 | 419430400 |
29| plwiki | categorylinks.ibd | 1719664640 | 2109734912 | 390070272 |
30| itwiki | ip_changes.ibd | 1870659584 | 2252341248 | 381681664 |
31| nlwiki | content.ibd | 3300917248 | 3682598912 | 381681664 |
32| nowiki | change_tag.ibd | 759169024 | 385875968 | 373293056 |
33| enwiktionary | content.ibd | 4022337536 | 4378853376 | 356515840 |
34| zhwiki | pagelinks.ibd | 13333692416 | 13014925312 | 318767104 |
35| svwiki | content.ibd | 2659188736 | 2977955840 | 318767104 |
36| bgwiki | change_tag.ibd | 612368384 | 306184192 | 306184192 |
37| svwiki | imagelinks.ibd | 1346371584 | 1040187392 | 306184192 |
38| trwiki | categorylinks.ibd | 1732247552 | 2017460224 | 285212672 |
39| itwiki | pagelinks.ibd | 12826181632 | 12566134784 | 260046848 |
40| svwiki | text.ibd | 3355443200 | 3095396352 | 260046848 |
414017 rows in set (0.165 sec)

Icinga downtime and Alertmanager silence (ID=70c6ecbd-6a05-4a34-925b-e609312bb73d) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db1158.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=3e9877ea-2d47-4259-904b-9c710818c63d) set by root@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: Index rebuild

db2218.codfw.wmnet
Marostegui updated the task description. (Show Details)

All done