The VC link between row D fpc1:1/0 and fpc8:1/0 has been flapping causing connectivity issues to hosts in both racks.
May 2 06:52:08 asw2-d-eqiad fpc1 [EX-BCM PIC] ex_bcm_linkscan_handler: Link 54 UP May 2 06:52:08 asw2-d-eqiad fpc1 [EX-BCM PIC] phy_40g_cr4_an_status : Port 54 mii_status = 0x88, ll_adv = 0x1, lp_adv = 0x0 May 2 06:52:08 asw2-d-eqiad fpc1 [EX-BCM PIC] phy_40g_cr4_an_status : Port 54 pause resolution = 0, ll_p = 0x0 lp_p = 0x0 May 2 06:52:08 asw2-d-eqiad fpc1 [EX-BCM PIC] ex_bcm_cr4_get_remote_pause: GET REMOTE PAUSE = 0x0, port 54 May 2 06:52:08 asw2-d-eqiad fpc1 BCM Error: API bcm_port_advert_remote_get(device, port, &ablity) at ex_bcm_get_remote_ability:716 -> Operation disabled May 2 06:52:08 asw2-d-eqiad fpc1 [EX-BCM PIC] ex_bcm_pic_get_an_info: Failed to get the remote ability for Rear QSFP+ PIC port 0 May 2 06:52:08 asw2-d-eqiad vccpd[1756]: Member 1, interface vcp-255/1/0 went down May 2 06:52:09 asw2-d-eqiad fpc1 [EX-BCM PIC] ex_bcm_pic_ifd_config: vcp-255/1/0, enable - 1 May 2 06:52:09 asw2-d-eqiad vccpd[1756]: JTASK_SIGNAL_UNKNOWN: Ignoring unknown signal SIGVTALRM (26) May 2 06:52:09 asw2-d-eqiad fpc8 Devrt num_vc_ports == 0 unit: 0 dest-mod: 1 May 2 06:52:09 asw2-d-eqiad fpc8 Devrt num_vc_ports == 0 unit: 0 dest-mod: 2 May 2 06:52:09 asw2-d-eqiad fpc8 Devrt num_vc_ports == 0 unit: 0 dest-mod: 3 May 2 06:52:09 asw2-d-eqiad vccpd[1756]: Member 1, interface vcp-255/1/0 came up May 2 06:52:09 asw2-d-eqiad vccpd[1756]: JTASK_SIGNAL_UNKNOWN: Ignoring unknown signal SIGVTALRM (26) May 2 06:55:51 asw2-d-eqiad fpc1 [EX-BCM PIC] ex_bcm_linkscan_handler: Link 54 UP May 2 06:55:51 asw2-d-eqiad fpc1 [EX-BCM PIC] phy_40g_cr4_an_status : Port 54 mii_status = 0x88, ll_adv = 0x1, lp_adv = 0x0 May 2 06:55:51 asw2-d-eqiad fpc1 [EX-BCM PIC] phy_40g_cr4_an_status : Port 54 pause resolution = 0, ll_p = 0x0 lp_p = 0x0 May 2 06:55:51 asw2-d-eqiad fpc1 [EX-BCM PIC] ex_bcm_cr4_get_remote_pause: GET REMOTE PAUSE = 0x0, port 54 May 2 06:55:51 asw2-d-eqiad fpc1 BCM Error: API bcm_port_advert_remote_get(device, port, &ablity) at ex_bcm_get_remote_ability:716 -> Operation disabled May 2 06:55:51 asw2-d-eqiad fpc1 [EX-BCM PIC] ex_bcm_pic_get_an_info: Failed to get the remote ability for Rear QSFP+ PIC port 0 May 2 06:55:51 asw2-d-eqiad vccpd[1756]: Member 1, interface vcp-255/1/0 went down May 2 06:55:52 asw2-d-eqiad vccpd[1756]: JTASK_SIGNAL_UNKNOWN: Ignoring unknown signal SIGVTALRM (26) May 2 06:55:52 asw2-d-eqiad vccpd[1756]: Member 1, interface vcp-255/1/0 came up May 2 06:55:52 asw2-d-eqiad vccpd[1756]: JTASK_SIGNAL_UNKNOWN: Ignoring unknown signal SIGVTALRM (26) May 2 06:55:53 asw2-d-eqiad vccpd[1756]: JTASK_SIGNAL_UNKNOWN: Ignoring unknown signal SIGVTALRM (26)
Disabling the link solved the issue:
asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 0 member 1
FPC1 still have 2 links up, one to fpc2 and one to fpc3 and fpc8 to 6 and 7 so we still have redundancy.
TODO:
- Add alerting on relevant syslog messages
- Decide if:
- we replace this cable and re-enable the port
- remove the link fully
- re-cable the row to match a standard VCF
I think we should remove the link fully (as we still have redundancy) and plan the recabling with T196487
Current cabling (with fpc1-fpc8 removed):