Page MenuHomePhabricator

Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6
Closed, ResolvedPublic

Description

We want to upgrade clouddb* production hosts from mariadb 10.4 to mariadb 10.6
The hosts we want to go for are the ones belonging to s6 for now:

  • clouddb1021.eqiad.wmnet
  • clouddb1019.eqiad.wmnet
  • clouddb1015.eqiad.wmnet

Event Timeline

Marostegui triaged this task as Medium priority.Apr 13 2023, 9:22 AM
Marostegui created this task.
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui added a subscriber: BTullis.

Data-Engineering when would it be a good time to get around 10-15 minutes downtime for clouddb1021? cc @BTullis
cloud-services-team any objections from your side with this migration? I would depool one host at the time, so no user impact should be expected during the migration of clouddb1019 and clouddb1015.

Could I get an answer on this please?

@Marostegui - you can upgrade clouddb1021 whenever is convenient for you, this week or next.

I also have no objections to the work on clouddb1019 and clouddb1015, but I'm interested how you are doing the depooling.
Will you use a cookbook to modify the haproxy runtime configuration, or push a puppet config change, or something else?

I've recently been working on trying to improve the availability of wikireplicas by removing the SPOF on each of dbproxy1018 and dbproxy1019.
i.e. https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Runbooks/Depool_wikireplicas#LVS_servers

However, it hasn't been very successful so far. In theory it should allow us to pool/depool a whole wikireplica cluster (web/analytics) and allow for maintenance of the dbproxy servers.
In practice it has caused problems with LVS/pybal and wikireplica outages, which I'm still trying to troubleshoot.

@BTullis I was planning to depool the other via the normal haproxy puppet change. But I am happy to try other approaches if you want me to

@BTullis I was planning to depool the other via the normal haproxy puppet change. But I am happy to try other approaches if you want me to

That's fine, thanks. You can go ahead with that, as far as I'm concerned. If you could cc me on the gerrit patches, I'd be grateful, but don't feel you need to wait for a +1 from me.
I'm just interested in all of the current wikireplica management processes and pondering if/how/when/why they could be improved. My first attempt at an improvent has had a slightly adverse result so far, so I need to have a rethink and review.
:-)

fnegri added subscribers: aborrero, fnegri.

cloud-services-team any objections from your side with this migration?

I don't think we have any objections, cc @aborrero

cloud-services-team any objections from your side with this migration?

I don't think we have any objections, cc @aborrero

Looks good to me, +1

Mentioned in SAL (#wikimedia-operations) [2023-07-26T06:34:07Z] <marostegui> Stop mariadb on clouddb1021 T334651

Change 941551 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] clouddb1021: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/941551

Change 941551 merged by Marostegui:

[operations/puppet@production] clouddb1021: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/941551

Marostegui added a subscriber: taavi.

clouddb1021 has been upgraded to 10.6. I will keep a close eye, but if you notice something weird or complaints about something, just let me know.

clouddb1015 migrated to 10.6. Leaving it for a few days before going for the last wikireplica of this section

Change 950964 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1018: Depool clouddb1019

https://gerrit.wikimedia.org/r/950964

Change 950964 abandoned by Marostegui:

[operations/puppet@production] dbproxy1018: Depool clouddb1019

Reason:

wrong branch

https://gerrit.wikimedia.org/r/950964

Change 950965 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1018: Depool clouddb1019

https://gerrit.wikimedia.org/r/950965

Change 950965 merged by Marostegui:

[operations/puppet@production] dbproxy1018: Depool clouddb1019

https://gerrit.wikimedia.org/r/950965

Change 950973 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] clouddb1019: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/950973

Change 950973 merged by Marostegui:

[operations/puppet@production] clouddb1019: Migrate to MariaDB 10.6

https://gerrit.wikimedia.org/r/950973

Marostegui updated the task description. (Show Details)

This is done