Page MenuHomePhabricator

Upgrade s4 to Buster + MariaDB 10.4
Closed, ResolvedPublic

Description

Given that the switch back will be in September, let's upgrade s4 to fully Buster and MariaDB 10.4

  • Available backup source on the standby DC (db1150 - all sections have sources everywhere now)
  • Switchover backup generation standby DC (waiting for DBAs to merge) https://gerrit.wikimedia.org/r/c/operations/puppet/+/714704
  • Candidate master on the standby DC
  • Master on the standby DC
  • Candidate master on the primary DC
  • Available backup source on the primary DC (db2139 - all sections have sources everywhere now)
  • Switchover backup generation Primary DC (waiting for switchover date https://gerrit.wikimedia.org/r/c/operations/puppet/+/715919 )
  • Switchover on the primary DC to promote a Buster+10.4 host to master: T289650
  • Upgrade the old master and make it a candidate master, pool it
  • Cleanup (remove) old backup sources from both DCs

Please read the doc about procedure for more details.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2021-09-02T11:28:12Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17161 and previous config saved to /var/cache/conftool/dbconfig/20210902-112812-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T11:43:16Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17162 and previous config saved to /var/cache/conftool/dbconfig/20210902-114315-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T11:58:20Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17163 and previous config saved to /var/cache/conftool/dbconfig/20210902-115819-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T12:13:23Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17164 and previous config saved to /var/cache/conftool/dbconfig/20210902-121323-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T12:28:27Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17165 and previous config saved to /var/cache/conftool/dbconfig/20210902-122826-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T12:44:34Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2119 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17167 and previous config saved to /var/cache/conftool/dbconfig/20210902-124434-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T12:59:38Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17168 and previous config saved to /var/cache/conftool/dbconfig/20210902-125937-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T13:14:41Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17169 and previous config saved to /var/cache/conftool/dbconfig/20210902-131441-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T13:29:45Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17171 and previous config saved to /var/cache/conftool/dbconfig/20210902-132945-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T13:44:49Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17172 and previous config saved to /var/cache/conftool/dbconfig/20210902-134448-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T14:03:58Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17174 and previous config saved to /var/cache/conftool/dbconfig/20210902-140357-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T14:19:02Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17175 and previous config saved to /var/cache/conftool/dbconfig/20210902-141901-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T14:34:05Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17176 and previous config saved to /var/cache/conftool/dbconfig/20210902-143405-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T14:49:09Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17177 and previous config saved to /var/cache/conftool/dbconfig/20210902-144908-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-02T15:04:13Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17178 and previous config saved to /var/cache/conftool/dbconfig/20210902-150412-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:11:25Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17193 and previous config saved to /var/cache/conftool/dbconfig/20210903-051124-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:11:50Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17194 and previous config saved to /var/cache/conftool/dbconfig/20210903-051149-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:26:28Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17195 and previous config saved to /var/cache/conftool/dbconfig/20210903-052628-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:26:54Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17196 and previous config saved to /var/cache/conftool/dbconfig/20210903-052653-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:41:32Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17198 and previous config saved to /var/cache/conftool/dbconfig/20210903-054131-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:41:57Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17199 and previous config saved to /var/cache/conftool/dbconfig/20210903-054157-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:56:35Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17200 and previous config saved to /var/cache/conftool/dbconfig/20210903-055635-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T05:57:01Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17201 and previous config saved to /var/cache/conftool/dbconfig/20210903-055700-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T06:11:39Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17202 and previous config saved to /var/cache/conftool/dbconfig/20210903-061138-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-03T06:12:04Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17203 and previous config saved to /var/cache/conftool/dbconfig/20210903-061204-root.json

s4 codfw backup is on-going now, once it is finished I will merge the backup patch.

Change 718938 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2090: Disable notifications

https://gerrit.wikimedia.org/r/718938

Change 718938 merged by Marostegui:

[operations/puppet@production] db2090: Disable notifications

https://gerrit.wikimedia.org/r/718938

Mentioned in SAL (#wikimedia-operations) [2021-09-06T05:07:47Z] <marostegui> Stop replication on db2090 (old s4 master) T289650 T288803

Change 718939 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db2090 to Buster

https://gerrit.wikimedia.org/r/718939

Change 718939 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db2090 to Buster

https://gerrit.wikimedia.org/r/718939

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2090.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202109060519_marostegui_3862.log.

Completed auto-reimage of hosts:

['db2090.codfw.wmnet']

and were ALL successful.

db2090 reimaged, checking its tables now

Change 715919 merged by Marostegui:

[operations/puppet@production] dbbackups: Migrate s4 generation from db2097 (stretch) to db2139 (buster)

https://gerrit.wikimedia.org/r/715919

db2090 came back clean - replication started

Mentioned in SAL (#wikimedia-operations) [2021-09-07T06:47:12Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2090 (re)pooling @ 5%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17228 and previous config saved to /var/cache/conftool/dbconfig/20210907-064711-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T07:02:16Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2090 (re)pooling @ 10%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17229 and previous config saved to /var/cache/conftool/dbconfig/20210907-070215-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T07:17:19Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2090 (re)pooling @ 25%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17232 and previous config saved to /var/cache/conftool/dbconfig/20210907-071719-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T07:32:23Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2090 (re)pooling @ 50%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17234 and previous config saved to /var/cache/conftool/dbconfig/20210907-073222-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T07:34:36Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Start to pool db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17235 and previous config saved to /var/cache/conftool/dbconfig/20210907-073436-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T07:47:26Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2090 (re)pooling @ 75%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17237 and previous config saved to /var/cache/conftool/dbconfig/20210907-074726-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T07:49:02Z] <marostegui@cumin1001> dbctl commit (dc=all): 'More weight for db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17238 and previous config saved to /var/cache/conftool/dbconfig/20210907-074901-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T08:02:30Z] <marostegui@cumin1001> dbctl commit (dc=all): 'db2090 (re)pooling @ 100%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17240 and previous config saved to /var/cache/conftool/dbconfig/20210907-080230-root.json

Mentioned in SAL (#wikimedia-operations) [2021-09-07T08:29:52Z] <marostegui@cumin1001> dbctl commit (dc=all): 'More weight for db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17241 and previous config saved to /var/cache/conftool/dbconfig/20210907-082952-marostegui.json

Marostegui updated the task description. (Show Details)

@jcrespo all yours, only pending backup clean up

Leaving it up, but idle, until next week.

Change 720766 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Remove s4 and s7 stretch backup source instances

https://gerrit.wikimedia.org/r/720766

Change 720767 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Reenable notifications on stretch hosts after maintenance

https://gerrit.wikimedia.org/r/720767

Change 720766 merged by Jcrespo:

[operations/puppet@production] dbbackups: Remove s4 and s7 stretch backup source instances

https://gerrit.wikimedia.org/r/720766

Change 720767 merged by Jcrespo:

[operations/puppet@production] dbbackups: Reenable notifications on stretch hosts after maintenance

https://gerrit.wikimedia.org/r/720767

I've cleaned up puppet, icinga and config files of:

  • db1116
  • db1139
  • db2097
  • db2100

However, it is getting late, and I still have to cleanup:

  • datadir
  • tendril
  • zarcillo
  • orchestrator
  • (final checks)

And I don't trust myself so late in the day- sorry if it creates some noise there (e.g. prometheus scrapping errors/tendril red status/orchestrator alerts) until I finish all tasks.

jcrespo reassigned this task from jcrespo to Marostegui.

Based on prometheus and tendril, I think this is done, but please do not doubt to ping me if there is any unintended leftover.