Page MenuHomePhabricator

Upgrade masters to 10.6.22 and 10.11.13 .2 update
Closed, ResolvedPublic

Description

At T397425: Build 10.6.22 and 10.11.13 with mdev36934 patch we had to patch our current 10.6.22 and 10.11.13 with an update as we were bitten by https://jira.mariadb.org/browse/MDEV-36934
Today for T399446: Switchover s5 master (db1230 -> db1210) as the primary wasn't patched we got bitten and the switchover had to be finished manually, as it failed when inverting replication.

Masters progress:

  • db1151.eqiad.wmnet
  • db1152.eqiad.wmnet
  • db1153.eqiad.wmnet
  • db1163.eqiad.wmnet
  • db1176.eqiad.wmnet (test host)
  • db1193.eqiad.wmnet
  • db1201.eqiad.wmnet
  • db1204.eqiad.wmnet
  • db1210.eqiad.wmnet
  • db1213.eqiad.wmnet
  • db1215.eqiad.wmnet (zarcillo)
  • db1220.eqiad.wmnet
  • db1222.eqiad.wmnet
  • db1223.eqiad.wmnet
  • db1228.eqiad.wmnet
  • db1236.eqiad.wmnet
  • db1244.eqiad.wmnet
  • db1250.eqiad.wmnet
  • db1255.eqiad.wmnet
  • es1035.eqiad.wmnet
  • es1038.eqiad.wmnet
  • pc1011.eqiad.wmnet
  • pc1012.eqiad.wmnet
  • pc1013.eqiad.wmnet
  • pc1014.eqiad.wmnet
  • pc1015.eqiad.wmnet
  • pc1016.eqiad.wmnet
  • pc1017.eqiad.wmnet
  • pc1018.eqiad.wmnet
  • db2142.codfw.wmnet
  • db2143.codfw.wmnet
  • db2144.codfw.wmnet
  • db2151.codfw.wmnet
  • db2165.codfw.wmnet
  • db2179.codfw.wmnet
  • db2183.codfw.wmnet
  • db2196.codfw.wmnet
  • db2203.codfw.wmnet
  • db2204.codfw.wmnet
  • db2209.codfw.wmnet
  • db2213.codfw.wmnet
  • db2214.codfw.wmnet
  • db2218.codfw.wmnet
  • db2232.codfw.wmnet
  • db2233.codfw.wmnet
  • db2234.codfw.wmnet
  • db2235.codfw.wmnet
  • db2241.codfw.wmnet
  • es2037.codfw.wmnet
  • es2038.codfw.wmnet
  • pc2011.codfw.wmnet
  • pc2012.codfw.wmnet
  • pc2013.codfw.wmnet
  • pc2014.codfw.wmnet
  • pc2015.codfw.wmnet
  • pc2016.codfw.wmnet
  • pc2017.codfw.wmnet
  • pc2018.codfw.wmnet

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Completed depool of db2165 - Upgrading db2165.codfw.wmnet - fceratto@cumin1002

Start pool of db2165 gradually with 4 steps - Upgrade of db2165.codfw.wmnet completed - fceratto@cumin1002

Completed pool of db2165 gradually with 4 steps - Upgrade of db2165.codfw.wmnet completed - fceratto@cumin1002

Upgrade of db2165.codfw.wmnet completed

Depooled db1152.eqiad.wmnet and db2142.codfw.wmnet Depool for MariaDB upgrade - fceratto@cumin1002 - T399540

Upgrade of db2142.codfw.wmnet completed

Upgrade of db2142.codfw.wmnet completed

Upgrade of db1152.eqiad.wmnet completed

Upgrade of db1152.eqiad.wmnet completed

Depooled db1152.eqiad.wmnet and db2142.codfw.wmnet Repool after MariaDB upgrade - fceratto@cumin1002 - T399540

Depooled db1153.eqiad.wmnet and db2143.codfw.wmnet Depool for MariaDB upgrade - fceratto@cumin1002 - T399540

Upgrade of db2143.codfw.wmnet completed

Upgrade of db2143.codfw.wmnet completed

Upgrade of db1153.eqiad.wmnet completed

Upgrade of db1153.eqiad.wmnet completed

Depooled db1153.eqiad.wmnet and db2143.codfw.wmnet Repool after MariaDB upgrade - fceratto@cumin1002 - T399540

Depooled db1151.eqiad.wmnet and db2144.codfw.wmnet Depool for MariaDB upgrade - fceratto@cumin1002 - T399540

Upgrade of db1151.eqiad.wmnet completed

Upgrade of db1151.eqiad.wmnet completed

Upgrade of db2144.codfw.wmnet completed

Upgrade of db2144.codfw.wmnet completed

Depooled db1151.eqiad.wmnet and db2144.codfw.wmnet Depool for MariaDB upgrade - fceratto@cumin1002 - T399540

db1176 is at version 10.6.22
db2183 has 10.11.13+deb12u2 installed and was rebooted 18 days ago
db2204 has 10.11.13+deb12u1 installed so probably needs update

Completed depool of db2179 - Upgrading db2179.codfw.wmnet - fceratto@cumin1002

Upgrade of db2179.codfw.wmnet completed

Upgrade of db2179.codfw.wmnet completed

Start pool of db2179* gradually with 4 steps - Upgrade MariaDB - fceratto@cumin1002

Completed pool of db2179* gradually with 4 steps - Upgrade MariaDB - fceratto@cumin1002

Completed depool of db2204 - Upgrading db2204.codfw.wmnet - fceratto@cumin1002

Upgrade of db2204.codfw.wmnet completed

Upgrade of db2204.codfw.wmnet completed

Start pool of db2204* gradually with 4 steps - Upgraded MariaDB - fceratto@cumin1002

Completed pool of db2204* gradually with 4 steps - Upgraded MariaDB - fceratto@cumin1002

Masters left: s3, s7, s8 in eqiad. A couple of backup sections in eqiad and codfw and db_inventory plus test-s4

Started cloning db1172.eqiad.wmnet to db1193.eqiad.wmnet - ladsgroup@cumin1002

Like the stupid person I am, I didn't check whether the candidate masters have the new version or not. Probably another round is needed in cases where the new master still is not on newer version.

Start pool of db1172 gradually with 4 steps - Pool db1172.eqiad.wmnet in after cloning - ladsgroup@cumin1002

Like the stupid person I am, I didn't check whether the candidate masters have the new version or not. Probably another round is needed in cases where the new master still is not on newer version.

s3 is okay but s1, s4 and s8 needs another round of switchovers.

Ladsgroup updated the task description. (Show Details)

Completed pool of db1172 gradually with 4 steps - Pool db1172.eqiad.wmnet in after cloning - ladsgroup@cumin1002

So these sections are needed:

  • s1
    • Upgrading the candidate master
    • Switching over
  • s4
    • Upgrading the candidate master
    • Switching over
  • s7
    • Upgrading the candidate master
    • Switching over
  • s8
    • Upgrading the candidate master
    • Switching over

I edit this comment.

Start pool of db1193 gradually with 4 steps - Pool db1193.eqiad.wmnet in after cloning - ladsgroup@cumin1002

Completed pool of db1193 gradually with 4 steps - Pool db1193.eqiad.wmnet in after cloning - ladsgroup@cumin1002

Finished cloning db1172.eqiad.wmnet to db1193.eqiad.wmnet - ladsgroup@cumin1002

Completed depool of db1173 - Upgrading db1173.eqiad.wmnet - fceratto@cumin1002

Upgrade of db1173.eqiad.wmnet completed

Upgrade of db1173.eqiad.wmnet completed

Completed depool of db1173 - Upgrading db1173.eqiad.wmnet - fceratto@cumin1002

Start pool of db1173 gradually with 4 steps - Upgrade of db1173.eqiad.wmnet completed - fceratto@cumin1002

Completed depool of db2191 - Upgrading db2191.codfw.wmnet - fceratto@cumin1002

Start pool of db2191 gradually with 4 steps - Upgrade of db2191.codfw.wmnet completed - fceratto@cumin1002

Completed pool of db1173 gradually with 4 steps - Upgrade of db1173.eqiad.wmnet completed - fceratto@cumin1002

Upgrade of db1173.eqiad.wmnet completed

Completed pool of db2191 gradually with 4 steps - Upgrade of db2191.codfw.wmnet completed - fceratto@cumin1002

Upgrade of db2191.codfw.wmnet completed

Mentioned in SAL (#wikimedia-operations) [2025-09-10T12:55:02Z] <ladsgroup@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1215.eqiad.wmnet with reason: Glow up (T399540 T394371)

Ladsgroup moved this task from In progress to Done on the DBA board.
Ladsgroup updated the task description. (Show Details)