Page MenuHomePhabricator

Moving 1G servers out of rack D4 in prep of switch migration
Closed, ResolvedPublic

Description

These servers need to be migrated out of rack D4. All of the servers are mw servers. There are no mw servers in D1 or D8. D8 has more room and would be easier to re-rack/cable in one go.

Note: These servers are 6 years old. Purchase date 02-20-2018. They were refreshed in Jan/Feb of 2023. But that team wants to keep them around until the end of the k8 transition. Not sure if this will be completed before migration.

serverportmove to?
mw2281ge-4/0/3D8-U32
mw2282ge-4/0/4D8-U33
mw2283ge-4/0/5D8-U34
mw2284ge-4/0/6D8-U35
mw2285ge-4/0/7D8-U36
mw2286ge-4/0/8D8-U37
mw2287ge-4/0/9D8-U38
mw2288ge-4/0/10D8-U39
mw2289ge-4/0/11D8-U40
mw2290ge-4/0/12D8-U41

Event Timeline

As these servers are up for decom, they won't be migrated to k8s, and they are in the current secondary datacenter. It doesn't really matter to us in which rack you move them, so D8 is fine if it's the easiest for y'all. @Papaul How long do you estimate the move would take?

The move should be something like:
serviceops:

  1. Depool the servers
  2. Downtime the servers

ops-codfw:

  1. Move the servers to their new location

serviceops:

  1. Check the servers boot back ok and httpbb clears
  2. Remove downtime
  3. Repool the servers

Please tell us if something is missing from the above, and around what day/time you'd want to do that move, and we'll find someone available for it.

Change #1037466 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] apiserver: colocate API canary with scap proxy

https://gerrit.wikimedia.org/r/1037466

Change #1037466 merged by Hnowlan:

[operations/puppet@production] apiserver: colocate API canary with scap proxy

https://gerrit.wikimedia.org/r/1037466

mw2282 is a kubernetes server, so would need to be drained and cordoned as well. However since they are to be decommed and in the secondary datacenter, we don't need them for capacity, and are discussing decommissioning them so you don't have to move them. What would be the deadline for decision?

@Clement_Goubert thanks for the update. Since i can not edit your comment I updating it here.

The move should be something like:
serviceops:

  1. Depool the servers
  2. Downtime the servers
  3. power down the servers

ops-codfw:

  1. Move the servers to their new location
  2. Re-cable the servers
  3. configure you interface
  4. Power back the servers

serviceops:

  1. Check the servers boot back ok and httpbb clears
  2. Remove downtime
  3. Repool the servers

How long will this take? A maximum of 10 minutes per server but this can be less.
what would be the deadline for decision? If we can make it before June 15th that will be great. We are waiting on https://phabricator.wikimedia.org/T360671 and once that arrived things will start moving fast for the migration.

@Clement_Goubert hello just checking back with you to see if you have an update for me.

Thanks

So sorry I didn't answer earlier. Apart from mw2282 which has been migrated to k8s, we will decom these hosts so you don't have to move them. Just tell us when you want to move mw2282 and we'll drain/cordon/downtime it beforehand.

All right, I'll do the draining monday beginning of the UTC afternoon so it's all set for you.

@Papaul All servers except mw2282 decommissioned.

rails, power, and network cables prepped for mw2282 move.

@Clement_Goubert hey we have a site visit at eqdfw on this Monday at 11:00am CT to check a power issue on our router that is there so we can not do the server move this Monday. Can you re-schedule that for this Wednesday 10:00am CT ? Thanks

Mentioned in SAL (#wikimedia-operations) [2024-06-19T13:17:14Z] <kamila_> drained mw2282.codfw.wmnet for T361856

Icinga downtime and Alertmanager silence (ID=8260b65f-a450-48bf-850e-4e458a18597f) set by kamila@cumin1002 for 12:00:00 on 1 host(s) and their services with reason: host move

mw2282.codfw.wmnet

@Clement_Goubert it is a U.S holiday today can we please rescheduled this for tomorrow . Thank you Sorry about that

No worries, I'll extend the downtime, and we'll leave it like that for you to move.

Icinga downtime and Alertmanager silence (ID=111d8ee1-db67-4ba6-a57a-50da8c8dc4ff) set by cgoubert@cumin1002 for 2 days, 0:00:00 on 1 host(s) and their services with reason: Host move

mw2282.codfw.wmnet

mw2282 move complete. We can close this task. Thanks @Clement_Goubert and @Jhancock.wm

Mentioned in SAL (#wikimedia-operations) [2024-06-20T16:00:38Z] <claime> Repooling and uncordoning mw2282.codfw.wmnet following move - T361856