As seen in T375652#10238434 we might have spotted an upgrade issue between 17 and 19 minors
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | ABran-WMF | T374425 db2205 stuck replication/processlist | |||
Resolved | ABran-WMF | T377276 db2205 is stuck at "Shutdown in progress" | |||
Open | PRODUCTION ERROR | Ladsgroup | T375652 Wikimedia\Rdbms\DBQueryError: Error 1062: Duplicate entry '1' for key 'PRIMARY' Function: MediaWiki\CheckUser\Services\CheckUserLogService::addLogEntry | ||
Resolved | ABran-WMF | T377718 db2205 and db2227 need to be recloned from 10.6.17 |
Event Timeline
this will be a bit harder than initially expected as we don't have any versioning of this package available on our repo:
$ sudo apt-cache madison wmf-mariadb106 wmf-mariadb106 | 10.6.19+deb12u1 | http://apt.wikimedia.org/wikimedia bookworm-wikimedia/main amd64 Packages
I can revert this way:
root@db2207:/var/cache/apt/archives# ls -lahrt|rg wmf-mariadb -rw-r--r-- 1 root root 129M Nov 23 2023 wmf-mariadb106_10.6.16+deb12u1_amd64.deb -rw-r--r-- 1 root root 130M Jun 7 05:08 wmf-mariadb106_10.6.18+deb12u1_amd64.deb
I even got the "right version" available:
arnaudb@cumin1002:~ $ sudo cumin 'db11*' 'ls -l /var/cache/apt/archives/wmf-mariadb*' 51 hosts will be targeted: db[1125,1150-1199].eqiad.wmnet OK to proceed on 51 hosts? Enter the number of affected hosts to confirm or "q" to quit: 51 ===== NODE GROUP ===== (2) db[1150,1171].eqiad.wmnet ----- OUTPUT of 'ls -l /var/cache...ves/wmf-mariadb*' ----- -rw-r--r-- 1 root root 281319080 Mar 25 2022 /var/cache/apt/archives/wmf-mariadb104_10.4.22+deb11u2_amd64.deb -rw-r--r-- 1 root root 135514296 Mar 19 2024 /var/cache/apt/archives/wmf-mariadb106_10.6.17+deb11u1_amd64.deb ===== NODE GROUP ===== (2) db[1164,1195].eqiad.wmnet ----- OUTPUT of 'ls -l /var/cache...ves/wmf-mariadb*' ----- -rw-r--r-- 1 root root 134129568 Oct 18 2023 /var/cache/apt/archives/wmf-mariadb106_10.6.14+deb12u1_amd64.deb ===== NODE GROUP ===== (8) db[1151,1159,1168,1170,1176,1179-1180,1187].eqiad.wmnet ----- OUTPUT of 'ls -l /var/cache...ves/wmf-mariadb*' ----- -rw-r--r-- 1 root root 135085384 Nov 23 2023 /var/cache/apt/archives/wmf-mariadb106_10.6.16+deb12u1_amd64.deb ===== NODE GROUP ===== (2) db[1165,1173].eqiad.wmnet ----- OUTPUT of 'ls -l /var/cache...ves/wmf-mariadb*' ----- -rw-r--r-- 1 root root 135085384 Nov 23 2023 /var/cache/apt/archives/wmf-mariadb106_10.6.16+deb12u1_amd64.deb -rw-r--r-- 1 root root 135777632 Mar 20 2024 /var/cache/apt/archives/wmf-mariadb106_10.6.17+deb12u1_amd64.deb ===== NODE GROUP ===== (36) db[1125,1152-1158,1160,1162-1163,1166-1167,1169,1172,1174-1175,1177-1178,1181-1186,1188-1194,1196-1199].eqiad.wmnet ----- OUTPUT of 'ls -l /var/cache...ves/wmf-mariadb*' ----- -rw-r--r-- 1 root root 135777632 Mar 20 2024 /var/cache/apt/archives/wmf-mariadb106_10.6.17+deb12u1_amd64.deb ===== NODE GROUP ===== (1) db1161.eqiad.wmnet ----- OUTPUT of 'ls -l /var/cache...ves/wmf-mariadb*' ----- -rw-r--r-- 1 root root 135777632 Mar 20 2024 /var/cache/apt/archives/wmf-mariadb106_10.6.17+deb12u1_amd64.deb -rw-r--r-- 1 root root 137057532 Aug 23 13:07 /var/cache/apt/archives/wmf-mariadb106_10.6.19+deb12u1_amd64.deb
@Ladsgroup are you OK with a manual reinstall+reclone?
Installing via dpkg doesn't sound too terrible to me but @jcrespo might know something I might have missed. Also if possible, check the hash of the package files against our repos?
thanks @jcrespo
@Ladsgroup as for the deb files hashes
on apt:
root@apt1002:marostegui $ sha256sum bookworm/bookworm/wmf-mariadb106_10.6.17+deb12u1_amd64.deb 00e19c3bbd1599e43595da687d25d3921b4d7e6fc7d2e2db528a654965964080 bookworm/bookworm/wmf-mariadb106_10.6.17+deb12u1_amd64.deb
on db2207:
arnaudb@db2207:~ $ sha256sum wmf-mariadb106_10.6.17+deb12u1_amd64.deb 00e19c3bbd1599e43595da687d25d3921b4d7e6fc7d2e2db528a654965964080 wmf-mariadb106_10.6.17+deb12u1_amd64.deb
Mentioned in SAL (#wikimedia-operations) [2024-10-22T07:58:30Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'T377718', diff saved to https://phabricator.wikimedia.org/P70456 and previous config saved to /var/cache/conftool/dbconfig/20241022-075830-arnaudb.json
@Ladsgroup if you want to validate it, db2205 is ready and recloned from (and to) 10.6.17.
Mentioned in SAL (#wikimedia-operations) [2024-10-22T12:07:53Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Depool db2149 and db2227 - T377718', diff saved to https://phabricator.wikimedia.org/P70479 and previous config saved to /var/cache/conftool/dbconfig/20241022-120753-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T12:12:18Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70480 and previous config saved to /var/cache/conftool/dbconfig/20241022-121218-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T12:27:24Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70483 and previous config saved to /var/cache/conftool/dbconfig/20241022-122723-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T12:42:29Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70485 and previous config saved to /var/cache/conftool/dbconfig/20241022-124228-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T12:57:34Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70487 and previous config saved to /var/cache/conftool/dbconfig/20241022-125734-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T13:12:40Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70489 and previous config saved to /var/cache/conftool/dbconfig/20241022-131239-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T13:27:46Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70493 and previous config saved to /var/cache/conftool/dbconfig/20241022-132745-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T13:41:26Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2227 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70495 and previous config saved to /var/cache/conftool/dbconfig/20241022-134126-arnaudb.json
both hosts have been downgraded and recloned:
arnaudb@cumin1002:~ $ sudo cumin 'db2205*,db2227*' 'dpkg -l|rg wmf-maria' 2 hosts will be targeted: db[2205,2227].codfw.wmnet OK to proceed on 2 hosts? Enter the number of affected hosts to confirm or "q" to quit: 2 ===== NODE GROUP ===== (2) db[2205,2227].codfw.wmnet ----- OUTPUT of 'dpkg -l|rg wmf-maria' ----- ii wmf-mariadb106 10.6.17+deb12u1 amd64 MariaDB plus patches. ================ PASS |██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (2/2) [00:00<00:00, 2.03hosts/s] FAIL | | 0% (0/2) [00:00<?, ?hosts/s] 100.0% (2/2) success ratio (>= 100.0% threshold) for command: 'dpkg -l|rg wmf-maria'. 100.0% (2/2) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
Mentioned in SAL (#wikimedia-operations) [2024-10-22T13:56:32Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70498 and previous config saved to /var/cache/conftool/dbconfig/20241022-135631-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T14:11:37Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70502 and previous config saved to /var/cache/conftool/dbconfig/20241022-141137-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T14:26:43Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2227 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70505 and previous config saved to /var/cache/conftool/dbconfig/20241022-142642-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T14:41:48Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70510 and previous config saved to /var/cache/conftool/dbconfig/20241022-144148-arnaudb.json
Mentioned in SAL (#wikimedia-operations) [2024-10-22T14:56:54Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70512 and previous config saved to /var/cache/conftool/dbconfig/20241022-145653-arnaudb.json
Good one Jaime :-) I always leave some of the oldest versions there so we can roll back if needed. I am glad you are able to read me so good :)