Page MenuHomePhabricator

rack & initial on-site setup of ms-be2016-2021
Closed, ResolvedPublic

Description

ms-be2016 through ms-be2021 have been ordered via https://rt.wikimedia.org/Ticket/Display.html?id=9624.

When these arrive (ETA 2015-10-20), they'll need to be racked and setup. Please receive in the order and attach the packing slip to the RT ticket.

Discussion between @RobH & @fgiunchedi resulted in the following layout (details/determination in comments below):

ms-be2016 : a2
ms-be2017 : a7
ms-be2018 : b2
ms-be2019 : b7
ms-be2020 : c2
ms-be2021 : c7

All 6 systems will need their mgmt interfaces & bios setup, as well as mgmt dns entries generated.

Details

Related Gerrit Patches:

Event Timeline

RobH created this task.Oct 5 2015, 9:26 PM
RobH raised the priority of this task from to Needs Triage.
RobH updated the task description. (Show Details)
RobH added a subscriber: RobH.
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptOct 5 2015, 9:26 PM
RobH added a comment.EditedOct 5 2015, 9:26 PM

We'll need to determine the racking location of these systems. Please note these come with both 1G and 10G connection options, but we'll be using the 10G options with DAC cables in the 10G racks in codfw.

I'm not sure of the current requirements (if these three hosts will be in differing or the same cluster, or how they need to be racked when used with the other systems.) All the older ms-be2001-2015 use 1G network connections. So none of the 6 new ms-be hosts will go in the same racks as any of the existing ms-be systems.

As these use 10G, they would be racked in the 10G racks: A2, A7, B2, B7, C2, C7, D2, D7. As we have 6 hosts incoming, and we tend to fill row A first, I'd suggest one host each in B2, B7, C2, C7, D2, D7.

RobH updated the task description. (Show Details)Oct 5 2015, 9:28 PM
RobH set Security to None.
RobH added a subscriber: fgiunchedi.
RobH assigned this task to fgiunchedi.Oct 5 2015, 11:06 PM

I'd like to get @fgiunchedi's viewpoint on this racking, just to ensure I'm not overlooking anything. If my proposed racking locations work, please assign back to me so I can update the task accordingly.

we have at least two options for expansion:

  1. grow the current allocation of three swift zones (i.e. a row is a zone) by allocating 2x machines in each of A/B/C
  2. create a new zone in row D with the new machines

if row A still has space I think we should go with 1. to keep things symmetric, thus A2, A7, B2, B7, C2, C7 would do it

RobH added a comment.Oct 8 2015, 3:07 PM

Row A does have space, my suggestion was purely in an effort to fill all the racks evenly. A2 and A7 have plenty of space.

So then reviewing @fgiunchedi's above comment, I'd propose racking the new systems in the following locations:

ms-be2016 : a2
ms-be2017 : a7
ms-be2018 : b2
ms-be2019 : b7
ms-be2020 : c2
ms-be2021 : c7

This is the preferred layout of the two options @fgiunchedi presents.

RobH renamed this task from [determine] rack ms-be2016-2021 to rack & initial on-site setup of ms-be2016-2021.Oct 8 2015, 3:09 PM
RobH reassigned this task from fgiunchedi to Papaul.
RobH triaged this task as Normal priority.
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)Oct 8 2015, 9:31 PM

@Papaul, there will be some bios settings to change too, likely: (related T112627: bios defaults on new hardware orders)

  • bios mode set to UEFI
  • power settings/profile ("power" ?)
  • all disks presented individually (i.e. each in raid0)

ms-be2016 10.193.1.12 port xe-0/2/7
ms-be2017 10.193.1.13 port xe-0/7/7
ms-be2018 10.193.1.14 port xe-0/2/7
ms-be2019 10.193.1.15 port xe-0/7/7
ms-be2020 10.193.1.16 port xe-0/2/6
ms-be2011 10.193.1.17 port xe-0/7/6

I had the switch number position wrong so changing it
ms-be2016 10.193.1.12 port xe-2/0/7
ms-be2017 10.193.1.13 port xe-7/0/7
ms-be2018 10.193.1.14 port xe-2/0/7
ms-be2019 10.193.1.15 port xe-7/0/7
ms-be2020 10.193.1.16 port xe-2/0/6
ms-be2011 10.193.1.17 port xe-7/0/6

RobH added a comment.Oct 27 2015, 9:58 PM

I've committed all the needed changes for the switch port description, enable, and vlan for ms-be2016 thorugh ms-be2021

on ms-be2018 one of the onboard nics is being detected as eth0 and the 10g as eth1, not clear the reason why

~ # ip address list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 5c:b9:01:fe:ba:00 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 3c:a8:2a:17:15:d8 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 5c:b9:01:fe:ba:01 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 3c:a8:2a:17:15:dc brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 5c:b9:01:fe:ba:02 brd ff:ff:ff:ff:ff:ff
7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc mq qlen 1000
    link/ether 5c:b9:01:fe:ba:03 brd ff:ff:ff:ff:ff:ff
~ # dmesg | grep -i eth
[    3.838587] bnx2x: Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.78.17-0 (2013/04/11)
[    4.151123] tg3 0000:02:00.0 eth0: Tigon3 [partno(N/A) rev 5719001] (PCI Express) MAC address 5c:b9:01:fe:ba:00
[    4.199051] tg3 0000:02:00.0 eth0: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    4.245726] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    4.283461] tg3 0000:02:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
[    4.431226] tg3 0000:02:00.1 eth2: Tigon3 [partno(N/A) rev 5719001] (PCI Express) MAC address 5c:b9:01:fe:ba:01
[    4.479222] tg3 0000:02:00.1 eth2: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    4.525782] tg3 0000:02:00.1 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    4.562865] tg3 0000:02:00.1 eth2: dma_rwctrl[00000001] dma_mask[64-bit]
[    4.619364] tg3 0000:02:00.2 eth4: Tigon3 [partno(N/A) rev 5719001] (PCI Express) MAC address 5c:b9:01:fe:ba:02
[    4.667089] tg3 0000:02:00.2 eth4: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    4.713289] tg3 0000:02:00.2 eth4: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    4.750533] tg3 0000:02:00.2 eth4: dma_rwctrl[00000001] dma_mask[64-bit]
[    4.819441] tg3 0000:02:00.3 eth5: Tigon3 [partno(N/A) rev 5719001] (PCI Express) MAC address 5c:b9:01:fe:ba:03
[    4.866881] tg3 0000:02:00.3 eth5: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    4.913850] tg3 0000:02:00.3 eth5: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    4.950694] tg3 0000:02:00.3 eth5: dma_rwctrl[00000001] dma_mask[64-bit]
[    7.544279] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[    8.454772] bnx2x 0000:04:00.0 eth1: using MSI-X  IRQs: sp 116  fp[0] 118 ... fp[7] 125
[    8.633839] bnx2x 0000:04:00.0 eth1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[    9.479214] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
[   10.367891] bnx2x 0000:04:00.1 eth3: using MSI-X  IRQs: sp 126  fp[0] 128 ... fp[7] 135
[   10.633693] IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
[   11.409527] IPv6: ADDRCONF(NETDEV_UP): eth4: link is not ready
[   12.145266] IPv6: ADDRCONF(NETDEV_UP): eth5: link is not ready
[   13.998259] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
~ #

on ms-be2018 one of the onboard nics is being detected as eth0 and the 10g as eth1, not clear the reason why

scratch that, both 2018 and 2019 now boot fine and the 10G interface is detected as eth0. for 2019 the embedded nic was enabled for 'network boot', disabling that did the trick. for 2018 apparently disabling 10G and reenabling also did the trick (note a full reboot is need for each step)

Change 250977 had a related patch set uploaded (by Filippo Giunchedi):
Fixed productionn DNS for ms-be2020 and ms-be2021

https://gerrit.wikimedia.org/r/250977

Change 250977 had a related patch set uploaded (by Filippo Giunchedi):
Fixed production DNS for ms-be2020 and ms-be2021

https://gerrit.wikimedia.org/r/250977

Change 250977 merged by Filippo Giunchedi:
Fixed production DNS for ms-be2020 and ms-be2021

https://gerrit.wikimedia.org/r/250977

Papaul closed this task as Resolved.Nov 4 2015, 5:55 PM

Servers OS installation complete.