Page MenuHomePhabricator

Move labstore1004 and labstore1005 to 10G Ethernet
Closed, ResolvedPublic

Description

This is intended to track progress on both servers, but each will have its own task since they will need to be done rather carefully, one at a time, in order to ensure minimal interruption of services. labstore1004/5 (the primary NFS cluster) both have unused dual port 10G interfaces (T127508). This is for the primary traffic interface first, and then the crossover cable.

labstore1004 is in rack C2, which I believe is a 10G rack, though it seems fairly busy. If switch ports are available, that makes that server somewhat simpler, though the crossover cable would need to be moved to the second 10G port on the server as well.

labstore1005 is in rack C5, which means it needs to be moved to a different rack altogether in order to set it up as 10G.

labstore1005 requires stopping replication from 1004 and stopping the backup jobs from running before shutdown.
labstore1004 requires a failover to labstore1005 to proceed.

I believe that the procedure used to re-image cloudvirt1019 and 1020 could be used in order to preserve the data volumes.

  • Reconnect the DRBD "crossover" cable setup between both servers with 10G interfaces to bring write performance to full levels.

Event Timeline

Bstorm created this task.

to validate the move, check 'drbd-overview' output before and after

Andrew changed the task status from Open to Stalled.Nov 18 2020, 8:07 PM

This is stalled pending available 10G rackspace in eqiad. The Tetris game there is well underway.

Cmjohnson subscribed.

This is now a duplicate task, we have a few for the same thing. I am resolving this one.

This isn't really a duplicate. It was the overall tracking ticket for the individual systems, configuration and also the effort to make DRBD work over a 10G connection. I'll make sure that's clear on it since the ticket wasn't all that clear to start with.

All moves were successful for the the traffic interfaces. The DRBD interface is still replicating, but it will heavily limit the write capabilities of the server (the replication is synchronous) until it is also at 10G.

I think this just needs a long enough cable between the two if the connectors are sfp+, per T266192#6710545

Reassigning to Brooke for drbd things

The move has taken place, if you have work to do outside of data center, please re-open and remove ops-eqiad and dc-ops tags.

Reviewing old tickets, this includes the 10G link for DRBD on it, and that is not yet done (though the moves are done). The DRBD link is still connected to 1G, which restricts write performance badly. We need to get that moved to 10G (possibly with a new cable that uses SFP+ instead) so that the synchronous replication is over 10G. Otherwise, these servers basically cannot fully use 10G.

The new cable is connected and confirmed working. I'll make a new task for the reconfig and retiring of the old cable.

Change 690563 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] labstore: Switch DRBD devices to using the 10Gb addresses

https://gerrit.wikimedia.org/r/690563

Change 690563 merged by Bstorm:

[operations/puppet@production] labstore: Switch DRBD devices to using the 10Gb addresses

https://gerrit.wikimedia.org/r/690563