Page MenuHomePhabricator

rack/setup/install LVS200[7-10]
Open, NormalPublic

Description

This task will track the receiving, racking, setup, and installation of LVS200[7-10]

Racking proposal

serversRackNIC1NIC2NIC3NIC4
LVS2007A2asw-a2asw-b2asw-c2asw-d2
LVS2008B2asw-b2asw-a2asw-c2asw-d2
LVS2009C2asw-c2asw-a2asw-b2asw-d2
LVS2010D2asw-d2asw-a2asw-b2asw-c2

@BBlack since lvs2007 will be racked in the same rack as Lvs200[1-3] and lvs2008 in the same rack as lvs200[4-6] I can re-use the cables/fibers off one of the old LVS if we decommission one of them and for lvs2009 and lvs2010, I can pull new cables/fibers. Let me know what you think.

LVS2007:

  • - receive in system on procurement task T193820
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run - set to staged in netbox
  • - handoff for service implementation - set to active when performing its service

LVS2008:

  • - receive in system on procurement task T193820
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run - set to staged in netbox
  • - handoff for service implementation - set to active when performing its service

LVS2009:

  • - receive in system on procurement task T193820
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

LVS2010:

  • - receive in system on procurement task T193820
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Wiring progress

serverNIC1NIC2NIC3NIC4
lvs2007
lvs2008
lvs2009completecompletecompletecomplete
lvs2010completecompletecompletecomplete

Event Timeline

Papaul triaged this task as Normal priority.Jun 6 2018, 3:06 PM
Papaul created this task.
Papaul created this object in space Restricted Space.
Restricted Application added a project: procurement. · View Herald TranscriptJun 6 2018, 3:06 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
BBlack added a comment.Jun 6 2018, 3:11 PM

This plan looks good, thanks! We should be able to do the decoms for cable re-use you're recommending as well. We might need to leave lvs2007 for last after the others are brought online, to make it easier by first switching one of lvs200[1-3] 's work over to one of the new LVSes first.

Papaul shifted this object from the Restricted Space space to the S1 Public space.Jun 7 2018, 2:54 AM
Papaul removed a project: procurement.
Papaul updated the task description. (Show Details)Jun 8 2018, 12:38 AM
Papaul updated the task description. (Show Details)Jun 8 2018, 12:48 AM
Vgutierrez updated the task description. (Show Details)Jun 8 2018, 7:18 AM
Papaul updated the task description. (Show Details)Jun 11 2018, 3:59 PM
Papaul updated the task description. (Show Details)Jun 11 2018, 5:30 PM
BBlack moved this task from Triage to Hardware on the Traffic board.Jun 11 2018, 5:35 PM
Papaul updated the task description. (Show Details)Jun 12 2018, 12:17 AM

Change 439803 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt & production DNS entries for lvs200[7-10]

https://gerrit.wikimedia.org/r/439803

Papaul updated the task description. (Show Details)Jun 12 2018, 12:45 AM

Change 439803 merged by Dzahn:
[operations/dns@master] DNS: Add mgmt & production DNS entries for lvs200[7-10]

https://gerrit.wikimedia.org/r/439803

Change 440360 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address and netboot entries for lvs2009 and lvs2010

https://gerrit.wikimedia.org/r/440360

Change 440360 merged by Dzahn:
[operations/puppet@production] DHCP: Add MAC address and netboot entries for lvs2009 and lvs2010

https://gerrit.wikimedia.org/r/440360

Papaul added a subscriber: ayounsi.Jun 15 2018, 3:14 PM

@ayounsi @BBlack I am getting the network error message below during install on both lvs2009 and lvs2010. Please advice. Thanks.

                                                                           
                                                                             
                                                                             
┌────────────────────┤ [!!] Configure the network ├─────────────────────┐    
│                                                                       │    
│                   Network autoconfiguration failed                    │    
│ Your network is probably not using the DHCP protocol. Alternatively,  │    
│ the DHCP server may be slow or some network hardware is not working   │    
│ properly.                                                             │    
│                                                                       │    
│                              <Continue>                               │    
│                                                                       │    
└───────────────────────────────────────────────────────────────────────┘

log on install2002 for lvs2010

DHCPDISCOVER from 00:0a:f7:f0:02:40 via 10.192.48.2
Jun 15 15:06:56 install2002 dhcpd[18272]: DHCPOFFER on 10.192.49.7 to 00:0a:f7:f0:02:40 via 10.192.48.2
Jun 15 15:06:56 install2002 dhcpd[18272]: DHCPDISCOVER from 00:0a:f7:f0:02:40 via 10.192.48.3
Jun 15 15:06:56 install2002 dhcpd[18272]: DHCPOFFER on 10.192.49.7 to 00:0a:f7:f0:02:40 via 10.192.48.3
Jun 15 15:07:00 install2002 dhcpd[18272]: DHCPREQUEST for 10.192.49.7 (208.80.153.53) from 00:0a:f7:f0:02:40 via 10.192.48.2
Jun 15 15:07:00 install2002 dhcpd[18272]: DHCPACK on 10.192.49.7 to 00:0a:f7:f0:02:40 via 10.192.48.2
Jun 15 15:07:00 install2002 dhcpd[18272]: DHCPREQUEST for 10.192.49.7 (208.80.153.53) from 00:0a:f7:f0:02:40 via 10.192.48.3
Jun 15 15:07:00 install2002 dhcpd[18272]: DHCPACK on 10.192.49.7 to 00:0a:f7:f0:02:40 via 10.192.48.3

I chat with @ayounsi, he confirmed that both servers were in the correct VLAN's. What i did on my end was to unplugged the other 3 NIC's form both server and leave only NIC1 and this fixed the issue.

Papaul updated the task description. (Show Details)Jun 18 2018, 4:13 PM
Papaul updated the task description. (Show Details)Jun 18 2018, 5:13 PM
Papaul reassigned this task from Papaul to BBlack.Jun 18 2018, 5:17 PM

@BBlack Lvs2009 and lvs2010 are ready. For switch port information please see T196946. Once they are up, we can decommission lvs2004 in row B rack B2 so I can setup lvs2008. Once done please assign task back to me.

Thanks.

Vvjjkkii renamed this task from rack/setup/install LVS200[7-10] to bjbaaaaaaa.Jul 1 2018, 1:05 AM
Vvjjkkii removed BBlack as the assignee of this task.
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
ema renamed this task from bjbaaaaaaa to rack/setup/install LVS200[7-10].Jul 2 2018, 9:33 AM
ema assigned this task to BBlack.
ema updated the task description. (Show Details)
ema lowered the priority of this task from High to Normal.Jul 2 2018, 9:58 AM
ema added subscribers: Aklapper, gerritbot.

@Papaul in lvs2009 on board NICs need to be disabled in the BIOS (in lvs2010 they're already disabled):

lvs2009
root@lvs2009:~# dmesg |grep tg3
[    2.524752] tg3.c:v3.137 (May 11, 2014)
[    2.545435] tg3 0000:04:00.0 eth4: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address d0:94:66:59:eb:ab
[    2.545440] tg3 0000:04:00.0 eth4: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    2.545443] tg3 0000:04:00.0 eth4: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    2.545447] tg3 0000:04:00.0 eth4: dma_rwctrl[00000001] dma_mask[64-bit]
[    2.564921] tg3 0000:04:00.1 eth5: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address d0:94:66:59:eb:ac
[    2.564926] tg3 0000:04:00.1 eth5: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    2.564929] tg3 0000:04:00.1 eth5: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    2.564933] tg3 0000:04:00.1 eth5: dma_rwctrl[00000001] dma_mask[64-bit]
[    2.790436] tg3 0000:04:00.1 eno2: renamed from eth5
[    2.834331] tg3 0000:04:00.0 eno1: renamed from eth4
lvs2010
vgutierrez@lvs2010:~$ sudo -i dmesg |grep tg3
vgutierrez@lvs2010:~$

@ayounsi interface naming in lvs2009 and lvs2010:

current namelvs2009lvs2010
nic1enp59s0f0enp59s0f0
nic2enp59s0f1d1enp59s0f1d1
nic3enp175s0f0enp175s0f0
nic4enp175s0f1d1enp175s0f1d1

LLDP neighbors match the task description:

lvs2009 LLDP
root@lvs2009:~# lldpcli show neighbors |egrep "Interface|SysName|PortDescr"
Interface:    enp59s0f0, via: LLDP, RID: 1, Time: 20 days, 14:08:06
    SysName:      asw-c-codfw
    PortDescr:    lvs2009:nic1
Interface:    enp59s0f1d1, via: LLDP, RID: 2, Time: 0 day, 00:13:10
    SysName:      asw-a-codfw
    PortDescr:    lvs2009:nic2
Interface:    enp175s0f0, via: LLDP, RID: 4, Time: 0 day, 00:12:54
    SysName:      asw-b-codfw
    PortDescr:    lvs2009:nic3
Interface:    enp175s0f1d1, via: LLDP, RID: 3, Time: 0 day, 00:13:00
    SysName:      asw-d-codfw
    PortDescr:    lvs2009:nic4
lvs2010 LLDP neighbors
root@lvs2010:~# lldpcli show neighbors |egrep "Interface|SysName|PortDescr"
Interface:    enp59s0f0, via: LLDP, RID: 1, Time: 31 days, 01:34:58
    SysName:      asw-d-codfw
    PortDescr:    lvs2010:nic1
Interface:    enp59s0f1d1, via: LLDP, RID: 4, Time: 0 day, 00:02:35
    SysName:      asw-a-codfw
    PortDescr:    lvs2010:nic2
Interface:    enp175s0f0, via: LLDP, RID: 2, Time: 0 day, 00:02:41
    SysName:      asw-b-codfw
    PortDescr:    lvs2010:nic3
Interface:    enp175s0f1d1, via: LLDP, RID: 3, Time: 0 day, 00:02:39
    SysName:      asw-c-codfw
    PortDescr:    lvs2010:nic4

Change 451607 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/dns@master] lvs2007-lvs2010 production DNS entries, all vlans

https://gerrit.wikimedia.org/r/451607

Switch ports descriptions updated.

Change 451607 merged by Vgutierrez:
[operations/dns@master] lvs2007-lvs2010 production DNS entries, all vlans

https://gerrit.wikimedia.org/r/451607

on boad NICs disable

Papaul updated the task description. (Show Details)Aug 13 2018, 4:22 PM
Papaul updated the task description. (Show Details)Aug 16 2018, 3:57 PM

Change 464584 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Partman: Add lvs2007 and lvs2008 to netboot.cfg

https://gerrit.wikimedia.org/r/464584

Change 464584 merged by Vgutierrez:
[operations/puppet@production] Partman: Add lvs2007 and lvs2008 to netboot.cfg

https://gerrit.wikimedia.org/r/464584

@Papaul we need to re-wire lvs2009 & lvs2010 to connect the first interface (enp175s0f0) to the main row for each server.

@Vgutierrez the first NIC of each server is connected to the switch where the server is racked in.
Example:
lvs2010 is racked in D2 so the first NIC is connected to asw-d2-codfw (xe-2/0/44)
lvs2009 is racked in C2 so the first NIC is connected to asw-c2-codfw (xe-2/0/44)

what is the MAC address of enp175s0f0?

@Papaul so at least in lvs2010, debian installer seems to think that enp175s0f0 is the first NIC, the mac addr is 00:0a:f7:f0:0c:10.
in lvs2009 the mac address is 00:0a:f7:f0:0b:70

@Vgutierrez on lvs2010 can you tell me which interface has this MAC address
Routing instance : default-switch

Vlan                MAC                 MAC         Age    Logical
name                address             flags              interface
private1-d-codfw    00:0a:f7:f0:02:40   D             -   xe-2/0/44.0

@Vgutierrez which position 2nd, 3rd or 4th? since enp175s0f0 is 1st

@Vgutierrez Recabling done as you requested on both servers

RobH added a comment.EditedNov 16 2018, 5:39 PM

I've updated the firmware for bios/idrac/network on lvs2009 & lvs2010.

lvs2007 & lvs2008 don't respond to mgmt interface connection attempts, and do not ping. Shouldn't their mgmt be setup at this point? (Someone checked off the boxes having tested bios/serial/drac, which means they should work.)

RobH added a comment.EditedNov 16 2018, 5:42 PM

I don't want to upload Dells firmware drivers to our systems (because I'm sure that is against some user agreements downloading the Dell software!) So I'll just link this here:

https://www.dell.com/support/home/us/en/04/product-support/servicetag/9lbtqp2/drivers

Yes the link is for lvs2010 specifically, but that is fine since lvs2007-lvs2010 are identical hardware. Also the Dell bios firmware update is smart, and will NOT apply updates that aren't signed for the hardware on the system.

That has the three following firmwares that should be downloaded and applied to lvs2007 & lvs2008:

  • Dell EMC Server PowerEdge BIOS R440/R540/T440 Version 1.5.6
  • iDRAC with Lifecycle Controller, 3.21.23.22
  • Broadcom NetXtreme-E Network Device Firmware and Configuration 20.8

I apply these via the https:// drac interface, system update option, and then it applies it and I watch it via serial console & web interface.

So, the NIC issue reported in T203194 seems to be fixed after upgrading the NIC firmware to version 21.40 (https://www.dell.com/support/home/us/en/04/drivers/driversdetails?driverid=3x5g0).

Could we get the FW upgraded in lvs2007-2010 as well?

RobH updated the task description. (Show Details)Apr 18 2019, 10:15 PM
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.Apr 25 2019, 5:15 PM

Hi @Vgutierrez - just following up on this to see if there was an ETA, since these are supposed to replace lvs2001-2006...which are all past their 5yr mark, and have the following hardware issues associated with them:

https://phabricator.wikimedia.org/T148017
https://phabricator.wikimedia.org/T192082
https://phabricator.wikimedia.org/T209337
https://phabricator.wikimedia.org/T213417

Thanks,
Willy

Dzahn removed a subscriber: Dzahn.Fri, Oct 4, 10:17 PM

So, the NIC issue reported in T203194 seems to be fixed after upgrading the NIC firmware to version 21.40 (https://www.dell.com/support/home/us/en/04/drivers/driversdetails?driverid=3x5g0).
Could we get the FW upgraded in lvs2007-2010 as well?

We can proceed as long as the FWs have been upgraded, could you confirm that @Papaul?

@Vgutierrez thanks for the update. The plan was for us to decommission one lvs in rack A2 and B2 so I can use the existing cables to setup lvs2007 and lvs2008. Both lvs2007 and 2008 are racked but not setup yet. (@BBlack since lvs2007 will be racked in the same rack as Lvs200[1-3] and lvs2008 in the same rack as lvs200[4-6] I can re-use the cables/fibers off one of the old LVS if we decommission one of them and for lvs2009 and lvs2010, I can pull new cables/fibers. Let me know what you think.) If the plan has changed please let me know.

For lvs2009 and lvs2010 I can do the upgrade.Please just let me know when those systems can be de-pool and power down.

Thanks

@Papaul you can proceed at will with lvs2009 and lvs2010 because they are not handling production traffic at the moment

@Vgutierrez firmware upgrade complete on both servers