Page MenuHomePhabricator

relocate/reimage cloudvirt1009 with 10G interfaces
Closed, ResolvedPublic

Description

updated checklist for request

cloudvirt1009 migration to 10G:

  • - put system offline in all checks for maint window
  • - relocate to 10G rack and update netbox
  • - enable PXE for 10G interfaces.
  • - update switch configuration for new primary 10G Nic
  • - update switch configuration and attach secondary 10G port
  • - remove old switch port info
  • - PXE boot and reimage system
  • - reintroduce system into service cluster

original request

The server cloudvirt1009.eqiad.wmnet has been reallocated from a previous cluster (and renamed from labvirt1009.eqiad.wmnet (see parent task).

Before we put any workload, we would like to evaluate if it's possible to get 10G for this server.

However, this is what ethtool reports:

aborrero@cloudvirt1009:~ $ sudo ethtool ens3f0
Settings for ens3f0:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
[...]
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
[...]
	Link detected: yes
aborrero@cloudvirt1009:~ $ sudo ethtool ens3f1
Settings for ens3f1:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
[...]
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
[...]
	Link detected: yes

Any advice from DC-ops, @RobH ?

Event Timeline

aborrero triaged this task as High priority.Feb 16 2019, 2:44 PM
aborrero created this task.
Restricted Application added a project: Operations. · View Herald TranscriptFeb 16 2019, 2:44 PM

Steps:

  • Move host to a rack with 10G -- B2, B4 or B7 I believe
  • Enable the 10G nic in the bios (note that we can not do this via mgmt; it will have to be done in the DC)
  • Move/install cables
  • Update switch config
  • Re-image
Andrew renamed this task from cloudvirt1009: evaluate upgrading to 10G to cloudvirt1009: upgrade to 10G.Feb 20 2019, 9:31 PM
RobH reassigned this task from RobH to Cmjohnson.Feb 20 2019, 9:34 PM
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.

So yeah earlier we tried to remotely enter bios and enable the 10G nic and failed (it requires crash cart.) So this is ready for Chris to take over and migrate the rack when he has time to do so.

GTirloni removed a subscriber: GTirloni.Mar 21 2019, 9:06 PM
RobH renamed this task from cloudvirt1009: upgrade to 10G to relocate/reimage cloudvirt1009 with 10G interfaces.Apr 4 2019, 5:46 PM
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)Apr 4 2019, 5:55 PM
Cmjohnson reassigned this task from Cmjohnson to Andrew.Apr 5 2019, 4:09 PM
Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: Cmjohnson.

The server is moved and is ready to install

Andrew added a comment.Apr 5 2019, 9:17 PM

I reimaged and built a canary VM -- the hosted VM cannot access any external networks. I haven't investigated this more deeply yet, but the first thing would be to confirm that that port is configured the same as the second port on cloudvirt1015 (since that one works as expected).

RobH reassigned this task from Andrew to Cmjohnson.Apr 5 2019, 10:53 PM

So, per @Andrew's request I've investigated the switch stack software for the secondary 'instance' connections for both cloudvirt1009 via T216324 and cloudvirt1012 via T217346.

This update applies to both, since both have the exact same symptoms and settings, as far as I can tell from the software/remote side.

@Andrew complained that the instance interface (the second 10G interface) on these two hosts (cloudvirt1009 and cloudvirt1012) wasn't coming up, while identically setup cloudvirt1015 worked fine.

I went ahead and logged into the switch stack, and will paste the output below. The summary is this, they all appear to have the exact same setup. They are in the exact same vlans/groups and all should work.

Output of switch commands to check they are all identically setup:

18 $> ssh asw2-b-eqiad.mgmt.eqiad.wmnet
--- JUNOS 14.1X53-D46.7 built 2017-11-23 22:06:48 UTC
{master:2}
robh@asw2-b-eqiad> show interfaces descriptions | grep cloudvirt1015 
xe-2/0/20       up    up   cloudvirt1015-eth0
xe-2/0/21       up    up   cloudvirt1015-eth1

{master:2}
robh@asw2-b-eqiad> show interfaces descriptions | grep cloudvirt1009    
xe-2/0/15       up    up   cloudvirt1009
xe-8/0/27                  cloudvirt1009 eth1

{master:2}
robh@asw2-b-eqiad> show interfaces descriptions | grep cloudvirt1012    
xe-2/0/16       up    up   cloudvirt1012
xe-8/0/28                  cloudvirt1012 eth1

{master:2}
robh@asw2-b-eqiad> edit   
Entering configuration mode
The configuration has been changed but not committed

{master:2}[edit]
robh@asw2-b-eqiad# show interfaces xe-2/0/21 | display inheritance 
description cloudvirt1015-eth1;
##
## '9192' was expanded from interface-range 'cloud-virt-instance-trunk'
##
mtu 9192;
##
## '0' was expanded from interface-range 'cloud-virt-instance-trunk'
##
unit 0 {
    ##
    ## 'ethernet-switching' was expanded from interface-range 'cloud-virt-instance-trunk'
    ##
    family ethernet-switching {
        ##
        ## 'trunk' was expanded from interface-range 'cloud-virt-instance-trunk'
        ##
        interface-mode trunk;
        ##
        ## 'vlan' was expanded from interface-range 'cloud-virt-instance-trunk'
        ##
        vlan {
            ##
            ## 'cloud-instances1-b-eqiad' was expanded from interface-range 'cloud-virt-instance-trunk'
            ## 'cloud-instances2-b-eqiad' was expanded from interface-range 'cloud-virt-instance-trunk'
            ##
            members [ cloud-instances1-b-eqiad cloud-instances2-b-eqiad ];
        }
    }
}

{master:2}[edit]
robh@asw2-b-eqiad# show interfaces xe-8/0/27 | display iner
                                                       ^
syntax error, expecting <command>.
robh@asw2-b-eqiad# show interfaces xe-8/0/27 | display inheritance 
description "cloudvirt1009 eth1";
##
## '9192' was expanded from interface-range 'cloud-virt-instance-trunk'
##
mtu 9192;
##
## '0' was expanded from interface-range 'cloud-virt-instance-trunk'
##
unit 0 {
    ##
    ## 'ethernet-switching' was expanded from interface-range 'cloud-virt-instance-trunk'
    ##
    family ethernet-switching {
        ##
        ## 'trunk' was expanded from interface-range 'cloud-virt-instance-trunk'
        ##
        interface-mode trunk;
        ##
        ## 'vlan' was expanded from interface-range 'cloud-virt-instance-trunk'
        ##
        vlan {
            ##
            ## 'cloud-instances1-b-eqiad' was expanded from interface-range 'cloud-virt-instance-trunk'
            ## 'cloud-instances2-b-eqiad' was expanded from interface-range 'cloud-virt-instance-trunk'
            ##
            members [ cloud-instances1-b-eqiad cloud-instances2-b-eqiad ];
        }
    }
}

{master:2}[edit]
robh@asw2-b-eqiad# show interfaces xe-8/0/28 | display inheritance    
description "cloudvirt1012 eth1";
##
## '9192' was expanded from interface-range 'cloud-virt-instance-trunk'
##
mtu 9192;
##
## '0' was expanded from interface-range 'cloud-virt-instance-trunk'
##
unit 0 {
    ##
    ## 'ethernet-switching' was expanded from interface-range 'cloud-virt-instance-trunk'
    ##
    family ethernet-switching {
        ##
        ## 'trunk' was expanded from interface-range 'cloud-virt-instance-trunk'
        ##
        interface-mode trunk;
        ##
        ## 'vlan' was expanded from interface-range 'cloud-virt-instance-trunk'
        ##
        vlan {
            ##
            ## 'cloud-instances1-b-eqiad' was expanded from interface-range 'cloud-virt-instance-trunk'
            ## 'cloud-instances2-b-eqiad' was expanded from interface-range 'cloud-virt-instance-trunk'
            ##
            members [ cloud-instances1-b-eqiad cloud-instances2-b-eqiad ];
        }
    }

I think this is a physical layer issue. We will need @Cmjohnson to confirm the physical cable for the following:

  • physically check and reseat the 10G DAC cable for cloudvirt1009 in xe-8/0/27.
    • you should see the link light come up, and show up in the switch software, please check this. If it doesn't come up, try another DAC cable.
  • physically check and reseat the 10G DAC cable for cloudvirt1012 in xe-8/0/28.
    • you should see the link light come up, and show up in the switch software, please check this. If it doesn't come up, try another DAC cable.
Andrew closed this task as Resolved.Apr 8 2019, 7:51 PM

fixed, pooled, working!