Page MenuHomePhabricator

Q2:(Need By: TBD) replace mr1-eqiad
Closed, ResolvedPublic

Description

This task will track the replacement of mr1-eqiad SRX220 with a new mr1-eqiad SRX300.

Hostname / Racking / Installation Details

This will be a swap out of the existing mgmt router.

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

mr1 SRX300:

  • - receive in system on procurement task T292013 & in coupa
  • - prepare router for racking with rackmount kit, label accordingly
  • - netops performs software backup and wipe of existing mr1 SRX220 device
  • - note the port assignments in netbox for ease of duplication later for new SRX300
  • - decommission the existing mr1-eqiad SRX220 in netbox
  • - enter new mr1-eqiad SRX300 into netbox, duplicate port setup from old mr1 as needed
  • - rack new mr1, connect all cables including scs, confirm scs functionality
  • - handoff to netops to finish setup and resolve this task

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedCmjohnson

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Procurement on the ops-eqiad board.
RobH removed a subscriber: RobH.

Racked and cabled updated netbox with connections

@Jclark-ctr can you sync up with me over IRC so I can give you the Junos image and config to put on a USB drive?
And please remove the cable from ge-0/0/1 for now.

cable has been removed pinged on irc

We synced up on IRC.

The SCS ports was not configured, imho that's something DCops should do.

Once done, looks like the device is stuck in a reboot loop with:

SPI stage 1 bootloader (Build time: Apr 26 2020 - 21:42:44)


U-Boot 2013.07-JNPR-3.9 (Build time: Apr 26 2020 - 21:42:45)

Octeon unique ID: 04810801447ca59e0297
N0.LMC0 Configuration Completed: 4096 MB
SRX_300 board revision major:1, minor:10, serial #: CV4121AN1632
OCTEON CN7020-AAP pass 1.2, Core clock: 1200 MHz, IO clock: 600 MHz, DDR clock: 667 MHz (1334 Mhz DDR)
DRAM: 4 GiB
Clearing DRAM...... done
SF: Detected SF with page size 256 Bytes, erase size 64 KiB, total 8 MiB


U-Boot 2013.07-JNPR-3.9 (Build time: Apr 26 2020 - 21:44:35)

Octeon unique ID: 04810801447ca59e0297
Using DRAM size from environment: 4096 MBytes
SRX_300 board revision major:1, minor:10, serial #: CV4121AN1632
OCTEON CN7020-AAP pass 1.2, Core clock: 1200 MHz, IO clock: 600 MHz, DDR clock: 667 MHz (1334 Mhz DDR)
DRAM: 4 GiB
Clearing DRAM...... done
SF: Detected SF with page size 256 Bytes, erase size 64 KiB, total 8 MiB
SATA0: not available
SATA1: not available
PCIe: Port 0 link active, 1 lanes, speed gen2 
PCIe: Link timeout on port 1, probably the slot is empty
PCIe: Port 2 not in PCIe mode, skipping
Net:   octeth0
Node 0 Interface 0 has 1 ports (SGMII)
Boot Media: eUSB usb 
Found TPM SLB9660 TT 1.2 by Infineon
TPM initialized
USB1:   Starting the controller
USB XHCI 1.00
scanning bus 1 for devices... 2 USB Device(s) found
USB0:   Starting the controller
USB XHCI 1.00
scanning bus 0 for devices... 2 USB Device(s) found
       scanning usb for storage devices... xhci_bulk_tx: ring halted
  ep info: 0x2, ep info2: 0x4000016, deq: 0x10f3aec41, tx_info: 0x0
  deq_seg: c0344f88, trb: c03aec00
  ring: c03ad018, buffer: 0x10ff51700, len: 0x1f, flags: 0x421
WARN halted endpoint, queueing URB anyway.
Unexpected XHCI event TRB, type: 33, expected: 32, skipping... (0f3a9c50 00000001 13000000 01008401)
Error: Mismatch slot ID or index, 0 != 1, field: 0x0, index: 0xffffffff, expect 0x3
Warning: transfer comp code 0x0 != 0x1a (COMP_STOP)
BUG: failure at xhci-ring.c:586/abort_td()!
BUG!

Dunno if the USB mention is a red-herring or not. But please troubleshoot it or follow up with JTAC and let me know once I can configure it.

@arzhel fixed the reboot issue, the external disk attached to the router was causing the reboots. I updated JUNOS to junos-srxsme-20.2R3-S2.5.tgz, I did change the password to a plain text password. You are good to continue with the replacement process.

Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: Jclark-ctr.

Swap has been done successfully!

Left to do: wipe the old one, rename the console server port of the new one.

loaded, configuration file
verified working
moved cables to new mr1-eqiad
left scs connection to old mr1 to wipe, still requires scs connection removed and unracking

ran new replace hardware script, outstanding!

Cmjohnson updated the task description. (Show Details)

Removed from rack, updated scs