Page MenuHomePhabricator

codfw: Relocate servers in 10G racks
Open, MediumPublic

Description

We have been receiving some PowerEdge R740xd2 and we will be receiving more of those servers in the future. Since those servers are very heavy (1) this makes it difficult to rack them in high U space in the rack ,(2) different length, makes it difficult to access any servers below them for maintenance. The plan is to relocate some servers to free the low U space withing some of the 10G racks. Please see below for the list of servers that I am planning on relocating. If you are a service owner of any of those servers, please confirm on the task if the date and time works for you .Thanks.

Note: After relocation, some servers might have new switch port within the same rack.

Rack B4 April 28th at 10:30am CT (COMPLETE)

  • logstash2034 from u14 to u27 new switch port xe-4/0/26 : @herron No action needed
  • db2096 from U1 to U16 new switch port ge-4/0/15: @Marostegui
  • sessionstore2001 from U2 to U17 new switch port ge-4/0/16 @Eevans

Rack B4 May 6th at 10:30 CT

  • kafka-main2002 from U3 to U18 new switch port xe-4/0/17 @fgiunchedi
  • dbprov2002 from U4 to U 19 new switch port xe-4/0/18 @jcrespo will shutdown manually/carefully
  • ms-be2053 from U5 to U14/15 new switch port xe-4/0/13 @fgiunchedi no action needed
  • logstash2027 from U6 to U 20 new switch port xe-4/0/19 @herron no action needed
  • mc-gp2002 from U8 to U21 new switch port xe-4/0/20 @jijiki no action needed
  • elastic2058 from U9 to U 28 @RKemper banned from elasticsearch cluster
  • cp2033 from U10 to U 29 @BBlack - already depooled
  • cp2034 from U11 to U 30 @BBlack - already depooled
  • ms-be2057 from U12/13 to U3/4 new switch port xe-4/0/1 @fgiunchedi no action needed

Rack C2 April 27: this will be done after the switch replacement (COMPLETE)

  • ms-be2042 from U1/2 to U8/9 new switch port:xe-2/0/7 : @fgiunchedi No action needed
  • ms-be2034 new switch port:xe-2/0/9 : @fgiunchedi no action needed
  • ms-be2035 new switch port:xe-2/0/11 @fgiunchedi no action needed
  • ms-be2048 new switch port:xe-2/0/13 @fgiunchedi no action needed
  • elastic2046 new switch port:xe-2/0/15 @Gehel no action needed
  • elastic2047 new switch port:xe-2/0/16 @Gehel no action needed
  • ms-be2055 new switch port:xe-2/0/17 @fgiunchedi no action needed
  • ms-fe2007 new switch port:xe-2/0/19 @fgiunchedi no action needed
  • dns2001 new switch port:xe-2/0/20 @BBlack no action needed
  • cp2035 new switch port:xe-2/0/26 @BBlack no action needed
  • cp2036 new switch port:xe-2/0/27 @BBlack no action needed
  • elastic2045 from U3 to U29 new switch port:xe-2/0/28: @Gehel No acction needed
  • Kafka-logging2003 from U4 to U30 new switch port:xe-2/0/29: @fgiunchedi No action needed
  • Moss-fe2001 from U5 to U31: new switch port:xe-2/0/30 @fgiunchedi No action needed

Rack C4 April 27th (COMPLETE)

  • Logstash2035 from U5 to U13 new switch port:xe-4/0/12: @herron No action needed
  • Ms-backup2001 from U6 to U14 new switch port:xe-4/0/13: @jcrespo

Rack D2

Event Timeline

Papaul triaged this task as Medium priority.Mon, Apr 26, 3:08 PM
Papaul updated the task description. (Show Details)
Papaul added a subscriber: BBlack.

ms-backup2002 and
ms-backup2001

are not yet fully into production -they will be soon (T276442), so they can be shutdown at any time.

I got confused with backup* hosts, which can be shutdown, but I need to make sure they are not being in use at the same time.

I will get db2096 ready for you.

Mentioned in SAL (#wikimedia-operations) [2021-04-28T06:00:02Z] <marostegui> Stop MySQL on db2096 (x1 codfw) T281135

@Papaul db2096 is now off, so you can proceed as needed.

Papaul updated the task description. (Show Details)

Almost too late to the party here, but it's fine to relocate sessionstore2001. Will it be put under maintenance in Icinga?

Papaul updated the task description. (Show Details)
Papaul updated the task description. (Show Details)
Papaul added a subscriber: RKemper.
Papaul added a subscriber: elukey.

@jcrespo can coordinate better the dbprov downtimes, I am swapping names there :)

@Papaul dbprov2002 should be shut down carefully to make sure data is kept intact (I'd prefer to do so). Otherwise, it can be down for e.g. 1 day.Will it need IP changes done beforehand?

Otherwise I can shut it down during my morning tomorrow, after backups complete and downtime it and its management host.

elukey added subscribers: jijiki, Joe.

@jcrespo no IP change just switch port change

Mentioned in SAL (#wikimedia-operations) [2021-05-06T10:19:36Z] <jynus> stop dbprov2002 in advance of maintenance T281135

@Papaul could you turn dbprov2002 back on when you finish all needed maintenance? That's all it will need to be back into service. Thank you.

Papaul updated the task description. (Show Details)

@BBlack i had meetings from 12:30 pm to 4PM so I didn't have the chance to work on the cp nodes. You can re-pool those since i will not be able to get back on those until the 24th of May. Thanks.