Page MenuHomePhabricator

migrate services from cumin2001 to cumin2002
Closed, ResolvedPublic

Description

This task will track the migration of services from cumin2001 to cumin2002.

cumin2002 is being ordered via T276383 and racked via T276587.

Once racking is complete, services on cumin2001 (and related network rules) will need to be migrated to cumin2002.

Event Timeline

RobH triaged this task as Medium priority.Mar 5 2021, 4:19 PM
RobH created this task.
MoritzMuehlenhoff removed projects: SRE-tools, netops.

I'll take care of this once the new server is racked.

Mentioned in SAL (#wikimedia-operations) [2021-04-20T12:58:04Z] <moritzm> reimaging cumin2002 to bullseye T276589

Change 681404 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Make cumin2002 a Cumin host

https://gerrit.wikimedia.org/r/681404

Change 681404 merged by Muehlenhoff:

[operations/puppet@production] Make cumin2002 a Cumin host

https://gerrit.wikimedia.org/r/681404

Change 685537 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] Add python_deploy_venv class

https://gerrit.wikimedia.org/r/685537

Change 685817 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add cumin2002 as cumin master and allow for tcpircbot and ganeti/rapi

https://gerrit.wikimedia.org/r/685817

Change 685817 merged by Muehlenhoff:

[operations/puppet@production] Add cumin2002 as cumin master and allow for tcpircbot and ganeti/rapi

https://gerrit.wikimedia.org/r/685817

Change 685537 merged by Volans:

[operations/puppet@production] Add python_deploy::venv class

https://gerrit.wikimedia.org/r/685537

Change 686393 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Install 10.5 client on host with only client packages

https://gerrit.wikimedia.org/r/686393

Change 688195 had a related patch set uploaded (by Volans; author: Volans):

[operations/homer/public@master] firewall: add cumin2002 to the cumin term

https://gerrit.wikimedia.org/r/688195

Change 688195 merged by jenkins-bot:

[operations/homer/public@master] firewall: add cumin2002 to the cumin term

https://gerrit.wikimedia.org/r/688195

Change 686393 merged by Muehlenhoff:

[operations/puppet@production] mariadb: Install 10.5 client on host with only client packages

https://gerrit.wikimedia.org/r/686393

Change 692621 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add grant for cumin2002

https://gerrit.wikimedia.org/r/692621

Change 693130 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Skip Cumin/Homer/Spicerack on cumin2001

https://gerrit.wikimedia.org/r/693130

Change 692621 merged by Muehlenhoff:

[operations/puppet@production] Add grant for cumin2002

https://gerrit.wikimedia.org/r/692621

Mentioned in SAL (#wikimedia-operations) [2021-05-21T08:56:24Z] <kormat> deploying cumin2002 grants to production T276589

Change 693376 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add cumin2002 to mysql_root_clients

https://gerrit.wikimedia.org/r/693376

The grant for cumin2002 should now be fully deployed.

Change 693376 merged by Muehlenhoff:

[operations/puppet@production] Add cumin2002 to mysql_root_clients

https://gerrit.wikimedia.org/r/693376

Change 693130 merged by Muehlenhoff:

[operations/puppet@production] Skip Cumin/Homer/Spicerack on cumin2001

https://gerrit.wikimedia.org/r/693130

If it's not too much trouble, it would be nice if cumin2001 could have a MOTD pointing you to cumin2002. If you accidentally log into cumin2001 you'll end up trying to run cookbooks that haven't been updated since May :/

If it's not too much trouble, it would be nice if cumin2001 could have a MOTD pointing you to cumin2002. If you accidentally log into cumin2001 you'll end up trying to run cookbooks that haven't been updated since May :/

We'll ditch cumin2001 very soon, it was only kept around for DBA purposes during the switchover window.

We'll ditch cumin2001 very soon, it was only kept around for DBA purposes during the switchover window.

That is (surprising) news to us :)

We're currently still migrating hosts from stretch to buster, hoping to finish that next quarter. So currently we need to support both of those releases. Once we get rid of stretch, we can look at supporting bullseye, too. But that implies keeping buster cumin hosts around until at least q1 of the next (calendar) year.

(CC: @Marostegui, @LSobanski)

If it's not too much trouble, it would be nice if cumin2001 could have a MOTD pointing you to cumin2002. If you accidentally log into cumin2001 you'll end up trying to run cookbooks that haven't been updated since May :/

We'll ditch cumin2001 very soon, it was only kept around for DBA purposes during the switchover window.

I thought it was going to be kept around for as long as we needed (obviously with a reasonable timeline). So this is totally new (to me at least)

So, it seems that we got some misunderstanding about expectations connected to the switchdc here between the various people involved.

@Marostegui @Kormat In order to try to make some progress here it would be nice if you could outline what are the current blockers from your point of view here on task and if/how @MoritzMuehlenhoff and I could help you.
I know we've discussed some of that on IRC at the time of the switch to codfw, but I have honestly forgot the details by now as too much time has passed.
From there we could list next steps and some rough timeline that would suit all people involved, if that works for you.

Talked to @MoritzMuehlenhoff on IRC; we are going to wait 2 weeks to have a meeting to sync up about this once @Kormat and @LSobanski are back

As discussed in the meeting, cumin2001 will remain in service until DB tooling is packaged for Bullseye (ETA. mid-Q3 FY21/22). At that point cumin1001 will also be re-imagined to Bullseye.

Any update on this? This upgrade is blocking serviceops who needs bullseye for the kubernetes python libraries and cookbooks.

Any update on this? This upgrade is blocking serviceops who needs bullseye for the kubernetes python libraries and cookbooks.

You can use cumin2002, it's running Bullseye for quite a while and I'm using it for all my spicerack and cumin needs without any issues. cumin2001 is EOLed hardware and will be decommissioned when the DBAs are done.

Because people have been randomly running cookbooks from cumin2001 with unknown results, I've manually edited /usr/bin/cookbook to prevent execution for now.

Any update on this? This upgrade is blocking serviceops who needs bullseye for the kubernetes python libraries and cookbooks.

I'm working on the wmfdb + wmfmariadbpy sides of this for data-persistence. wmfdb is good to go, i'm currently doing some testing with wmfmariadbpy in pontoon in preparation for doing a new release in the next day or so.

The other 2 pieces we care about are transfer.py, and wmfbackups, @jcrespo is handling those.

I'm working on the wmfdb + wmfmariadbpy sides of this for data-persistence. wmfdb is good to go, i'm currently doing some testing with wmfmariadbpy in pontoon in preparation for doing a new release in the next day or so.

Released and deployed (T302796: Deploy wmfmariadbpy 0.9).

Change 767181 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Setup x1 snapshots on cumin2002

https://gerrit.wikimedia.org/r/767181

Change 767181 merged by Jcrespo:

[operations/puppet@production] dbbackups: Setup x1 snapshots on cumin2002

https://gerrit.wikimedia.org/r/767181

Change 767212 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Migrate codfw DB snapshot orchestration from cumin2001 to 2002

https://gerrit.wikimedia.org/r/767212

Change 767212 merged by Jcrespo:

[operations/puppet@production] dbbackups: Migrate codfw DB snapshot orchestration from cumin2001 to 2002

https://gerrit.wikimedia.org/r/767212

The x1 backup worked as expected, with normal performance. I just uploaded the 0.6 packages and migrated backups to cumin2002. I will do another large backup test, but other than tagging the release and waiting for the full backup cycle tonight, everything looks fine.

This means: cumin2002 got converted as the backup orchestration host, it has already the right config. cumin2001 is a passive (disabled) backup orchestration host. If cumin1001 gets reimaged to bullseye, on reimage the right setup will be available. Only if, eg. a cumin1002 host is created we will need a trivial deployment- as backup config is per host, not per profile.

TL:TR: unless backups fail spectacularly tonight, no more work needed on my side (except pending bugfixes and improvements that couldn't be done for this release).

Backups worked without errors tonight, all migration work done and ready to upgrade the backup hosts next.

@MoritzMuehlenhoff both DB and Backup tooling work is completed so at this point we are ready to go ahead and upgrade the Cumin hosts.

Ack, thanks! We'll probably go ahead with this next week, first by removing cumin2001 and then reimaging cumin1001.

Change 768657 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove cumin2001 from mysql root clients and related grants

https://gerrit.wikimedia.org/r/768657

Change 768657 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove cumin2001 from mysql root clients and related grants

https://gerrit.wikimedia.org/r/768657

Apart from merging this, we also need to remove the grants from production (@Kormat would you be able to take care of that?)

Change 768657 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove cumin2001 from mysql root clients and related grants

https://gerrit.wikimedia.org/r/768657

Apart from merging this, we also need to remove the grants from production (@Kormat would you be able to take care of that?)

Sure.

Change 768670 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Switch cumin2001 to insetup role

https://gerrit.wikimedia.org/r/768670

Change 768657 merged by Kormat:

[operations/puppet@production] Remove cumin2001 from mysql root clients and related grants

https://gerrit.wikimedia.org/r/768657

Mentioned in SAL (#wikimedia-operations) [2022-03-07T14:00:29Z] <kormat> removing cumin2001 grants from all db sections T276589

Granted removed:

  • es1
  • es2
  • es3
  • es4
  • es5
  • m1
  • m2
  • m3
  • m5
  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7
  • s8
  • db_inventory
  • x1
  • x2
  • pc1
  • pc2
  • pc3
  • mediabackupstemp
  • A:db-core-test

Granted removed:

Alright, that should be all the grants cleaned up.

Change 768670 merged by Muehlenhoff:

[operations/puppet@production] Switch cumin2001 to insetup role

https://gerrit.wikimedia.org/r/768670

cumin2002 is the active Cumin host in codfw, decommission of cumin2001 happens via https://phabricator.wikimedia.org/T303399