We want to upgrade clouddb* production hosts from mariadb 10.4 to mariadb 10.6
The hosts we want to go for are the ones belonging to s6 for now:
- clouddb1021.eqiad.wmnet
- clouddb1019.eqiad.wmnet
- clouddb1015.eqiad.wmnet
We want to upgrade clouddb* production hosts from mariadb 10.4 to mariadb 10.6
The hosts we want to go for are the ones belonging to s6 for now:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Marostegui | T334650 Migrate s6 to MariaDB 10.6 | |||
Resolved | Marostegui | T334651 Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 |
Data-Engineering when would it be a good time to get around 10-15 minutes downtime for clouddb1021? cc @BTullis
cloud-services-team any objections from your side with this migration? I would depool one host at the time, so no user impact should be expected during the migration of clouddb1019 and clouddb1015.
@Marostegui - you can upgrade clouddb1021 whenever is convenient for you, this week or next.
I also have no objections to the work on clouddb1019 and clouddb1015, but I'm interested how you are doing the depooling.
Will you use a cookbook to modify the haproxy runtime configuration, or push a puppet config change, or something else?
I've recently been working on trying to improve the availability of wikireplicas by removing the SPOF on each of dbproxy1018 and dbproxy1019.
i.e. https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Runbooks/Depool_wikireplicas#LVS_servers
However, it hasn't been very successful so far. In theory it should allow us to pool/depool a whole wikireplica cluster (web/analytics) and allow for maintenance of the dbproxy servers.
In practice it has caused problems with LVS/pybal and wikireplica outages, which I'm still trying to troubleshoot.
@BTullis I was planning to depool the other via the normal haproxy puppet change. But I am happy to try other approaches if you want me to
That's fine, thanks. You can go ahead with that, as far as I'm concerned. If you could cc me on the gerrit patches, I'd be grateful, but don't feel you need to wait for a +1 from me.
I'm just interested in all of the current wikireplica management processes and pondering if/how/when/why they could be improved. My first attempt at an improvent has had a slightly adverse result so far, so I need to have a rethink and review.
:-)
cloud-services-team any objections from your side with this migration?
I don't think we have any objections, cc @aborrero
Mentioned in SAL (#wikimedia-operations) [2023-07-26T06:34:07Z] <marostegui> Stop mariadb on clouddb1021 T334651
Change 941551 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] clouddb1021: Migrate to MariaDB 10.6
Change 941551 merged by Marostegui:
[operations/puppet@production] clouddb1021: Migrate to MariaDB 10.6
clouddb1021 has been upgraded to 10.6. I will keep a close eye, but if you notice something weird or complaints about something, just let me know.
clouddb1015 migrated to 10.6. Leaving it for a few days before going for the last wikireplica of this section
Change 950964 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] dbproxy1018: Depool clouddb1019
Change 950964 abandoned by Marostegui:
[operations/puppet@production] dbproxy1018: Depool clouddb1019
Reason:
wrong branch
Change 950965 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] dbproxy1018: Depool clouddb1019
Change 950965 merged by Marostegui:
[operations/puppet@production] dbproxy1018: Depool clouddb1019
Change 950973 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] clouddb1019: Migrate to MariaDB 10.6
Change 950973 merged by Marostegui:
[operations/puppet@production] clouddb1019: Migrate to MariaDB 10.6