Page MenuHomePhabricator

labstore1004/5 - Reconfigure the DRBD interfaces to run across the 10G cable
Closed, ResolvedPublic

Description

The second 10G interfaces on the primary cluster are now confirmed connected. We need to figure out changing the interface the DBRD devices use as safely as possible with minimal downtime so that write traffic can operate at full speed on the cluster.

For test, 192.168.0.3 and 4 are assigned to the interfaces (both enp133s0f1). Those likely need to switch to the existing numbers without corrupting the data or something.

Event Timeline

Change 690563 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] labstore: Switch DRBD devices to using the 10Gb addresses

https://gerrit.wikimedia.org/r/690563

Change 690563 merged by Bstorm:

[operations/puppet@production] labstore: Switch DRBD devices to using the 10Gb addresses

https://gerrit.wikimedia.org/r/690563

Drat. It's not going to be this easy. I believe there still needs to be a "crossover" adjustment to the cable end to make this work. Reverting and making a subtask.

Bstorm changed the task status from Open to Stalled.May 13 2021, 5:30 PM

Waiting for cables, but that went very well...as did rollback.

Change 691254 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] labstore: Switch DRBD devices to using the 10Gb addresses

https://gerrit.wikimedia.org/r/691254

Bstorm changed the task status from Stalled to Open.May 14 2021, 6:06 PM

We are trying again with some new optics

Change 691254 merged by Bstorm:

[operations/puppet@production] labstore: Switch DRBD devices to using the 10Gb addresses

https://gerrit.wikimedia.org/r/691254

Change 691262 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] cloud nfs: fix the netmask for swapping cables

https://gerrit.wikimedia.org/r/691262

Change 691262 merged by Bstorm:

[operations/puppet@production] cloud nfs: fix the netmask for swapping cables

https://gerrit.wikimedia.org/r/691262

It works! It was me. I forgot it was using a restrictive CIDR.

labstore1004 and 5 are now a fully 10G cluster. I can now fully elevate the limits on NFS clients to reflect that.