Page MenuHomePhabricator

Move cloudvirt hosts to 10Gb ethernet
Closed, ResolvedPublic

Description

Most of our cloudvirt hosts have idle 10Gb nics. We haven't used them historically because their row was short on ports. Now that the switches have been upgraded, we can start moving these hosts over to 10Gb.

Each cloudvirt uses two nics. It would be nice to move both to 10Gb but if that's somehow difficult or expensive we can leave the control plane on 1Gb.

  • Determine which servers have unused 10Gb nics
    • all of them, it seems
  • Verify with dc-ops that there are abundant ports Only in racks 2, 4, and 7
  • Write a plan for switching over a given host (presumably as part of a rebuild)

Please note each host will have a sub-task linked into this task for its actual relocation/recabling/reimaging. This is due to each host having multiple steps. This primary tracking task will simply have the task description summarize overall status.

  • cloudvirt1001 - currently in rack b3- empty and depooled -- T221141
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1002 - currently in rack b3- empty and depooled -- T221140
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1003 - currently in rack b3- empty and depooled -- T221139
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1004 - currently in rack b5 - empty and depooled -- T221138
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1005 - currently in rack b5 - empty and depooled -- T221049
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1006 - currently in rack b5 - empty and depooled -- T221048
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1007 - currently in rack 5 - empty and depooled -- T221047
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1008 -- currently empty and depooled -- T216661
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1009 -- currently empty and depooled -- T216324
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1012 -- currently empty and depooled -- T217346
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1013 -- T243414
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1014 - currently in rack b5 -- T226188
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1015 -- currently empty and depooled -- T217140
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1016 - currently in rack b4 -- currently empty and depooled -- T228692
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1017 - currently in rack b7 -- currently empty and depooled -- T228691
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1018 -- T217347
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack - 1 of 2 interfaces connected
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1019
  • - system was deployed as 10G, nothing to do here.
  • cloudvirt1020
  • - system was deployed as 10G, nothing to do here.
  • cloudvirt1021 - currently in rack b4 -- T229873
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1022 - currently in rack b7 -- T229872
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1023 -- T229871
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1024 -- currently empty and depooled -- T216724
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team)
  • - system relocated to 10G interfaces/rack - 1 of 2 interfaces connected
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1025 -- T266187
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack, pxe boot confirmed functional, system powered off until reimage step below
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1026 - relocated -- T266281
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1027 - relocated -- T266369
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1028 - relocated -- T266514
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1029 - relocated -- T266206
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster
  • cloudvirt1030 - relocated -- T266623
  • - system drained of traffic on 1G, ready for relocation (this checkbox should only be checked by cloud-services-team & only when a sub-task has been populated for this server)
  • - system relocated to 10G interfaces/rack
  • - system reimaged
  • - system reintroduced into service cluster

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolveddcaro
ResolvedCmjohnson
ResolvedAndrew
ResolvedVgutierrez
ResolvedAndrew
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedAndrew
Resolvedaborrero
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedCmjohnson
Resolvedayounsi
Resolvedaborrero
ResolvedCmjohnson
ResolvedJclark-ctr
ResolvedJclark-ctr
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
Resolveddcaro
Resolvedaborrero
Declineddcaro
Resolveddcaro
OpenNone
OpenNone
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
InvalidAndrew
ResolvedRobH
ResolvedAndrew
ResolvedRobH
ResolvedRobH
ResolvedRobH
ResolvedRobH
ResolvedRobH
ResolvedAndrew

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 528231 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] nova: update scheduler pool

https://gerrit.wikimedia.org/r/528231

Change 528231 merged by Andrew Bogott:
[operations/puppet@production] nova: update scheduler pool

https://gerrit.wikimedia.org/r/528231

Ottomata added a subscriber: Ottomata.

@Andrew, assigning to you as you seem to be leading this parent task. Feel free to undo or reassign as necessary.

this is waiting for ceph so that we can move/rebuild things without user downtime.

Note that now racks C8 and D5 are dedicated to WMCS servers (including cloudvirt). So please move servers there when able.

Mentioned in SAL (#wikimedia-cloud) [2020-10-25T16:20:39Z] <andrewbogott> adding cloudvirt1038 to the 'ceph' aggregate and removing from the 'spare' aggregate. We need this space while waiting on network upgrades for empty cloudvirts (T216195)

@Cmjohnson, in case you were waiting to do all these in bulk: all remaining cloudvirts are now ready for upgrade.

Change 644886 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] cloudvirt10[26-30] mac updates

https://gerrit.wikimedia.org/r/644886

Change 644886 merged by RobH:
[operations/puppet@production] cloudvirt10[26-30] mac updates

https://gerrit.wikimedia.org/r/644886

Please note I've corrected the MAC entries for cloudvirt10[25-30], they were all off by two characters. This was likely caused by polling the wrong interface port on the NIC (there are 4 of them), via a script, across the relocation group. This is a very, very easy mistake to make (I know I've done it myself in the past). It has now been corrected, and I've resolved the relocation sub-tasks.

Cloudvirt10[25-30] are now ready for reimage. IRC conversation with @Andrew and myself resulted in the next steps being either @Andrew or @dcaro will reimage these hosts later today. If any other issues arise, please don't hesitate to ping me in IRC and I can assist.

Change 644943 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Cloudvirt1025-1030: Update nic names for 10Gb/Buster

https://gerrit.wikimedia.org/r/644943

Change 644943 merged by Andrew Bogott:
[operations/puppet@production] Cloudvirt1025-1030: Update nic names for 10Gb/Buster

https://gerrit.wikimedia.org/r/644943

Mentioned in SAL (#wikimedia-cloud) [2020-12-03T16:38:44Z] <dcaro> Rimaging cloudvirt1026 (T216195)

Mentioned in SAL (#wikimedia-cloud) [2020-12-07T14:49:45Z] <dcaro> Re-imaging cloudvirt1027 (T216195)

Mentioned in SAL (#wikimedia-cloud) [2020-12-07T15:55:21Z] <andrewbogott> reimaging cloudvirt1028 for T216195

Mentioned in SAL (#wikimedia-cloud) [2020-12-08T12:14:24Z] <dcaro> Re-imaging cloudvirt1029 (T216195)

Mentioned in SAL (#wikimedia-cloud) [2020-12-08T14:13:45Z] <dcaro> Host re-imaged, doing tests cloudvirt1029 (T216195)

Mentioned in SAL (#wikimedia-cloud) [2020-12-08T14:18:35Z] <dcaro> Host online cloudvirt1029 (T216195)

Mentioned in SAL (#wikimedia-cloud) [2020-12-08T15:59:06Z] <dcaro> Re-imaging host cloudvirt1030 (T216195)

Mentioned in SAL (#wikimedia-cloud) [2020-12-08T18:01:05Z] <dcaro> Host cloudvirt1030 up and running (T216195)

dcaro updated the task description. (Show Details)