cr2-codfw> show system alarms 4 alarms currently active Alarm time Class Description 2021-07-21 18:14:31 UTC Minor FPC 0 Temp Sensor Fail 2021-07-21 17:54:23 UTC Major FPC 0 Hard errors 2021-07-21 17:53:48 UTC Major FPC 0 offlined due to unreachable destinations 2021-07-21 17:53:38 UTC Major FPC 0 has unreachable destinations
Current status is loss of redundancy to all the codfw rows:
re0.cr2-codfw> show interfaces descriptions Interface Admin Link Description et-0/0/0 Core: asw-a-codfw:et-7/0/52 {#10706} et-0/0/1 Core: asw-b-codfw:et-7/0/52 {#10707} et-0/2/0 Core: asw-c-codfw:et-7/0/52 {#10708} et-0/2/1 Core: asw-d-codfw:et-7/0/52 {#10709}
Logs are full of:
Jul 21 18:00:00 re0.cr2-codfw kernel: Resil 12316 IIC-SIG sent 12:54:a1:00 00000000 Jul 21 18:00:00 re0.cr2-codfw chassisd[12316]: CHASSISD_I2CS_READBACK_ERROR: Readback error from I2C slave for FPC 0 ([0x12, 0x23] -> 0x0) Jul 21 18:00:00 re0.cr2-codfw kernel: PCF8584(WR): target ack failure on byte 0 Jul 21 18:00:00 re0.cr2-codfw kernel: PCF8584(WR): (i2c_s1=0x08, group=0x12, device=0x54) Jul 21 18:00:00 re0.cr2-codfw kernel: Resil 12316 IIC-SIG sent 12:54:23:00 00000000 Jul 21 18:00:00 re0.cr2-codfw kernel: PCF8584(WR): target ack failure on byte 0 Jul 21 18:00:00 re0.cr2-codfw kernel: PCF8584(WR): (i2c_s1=0x08, group=0x12, device=0x54) Jul 21 18:00:00 re0.cr2-codfw kernel: Resil 12316 IIC-SIG sent 12:54:23:00 00000000
Tried to reboot it but doesn't work:
re0.cr2-codfw> request chassis fpc slot 0 restart FPC 0 is in transition, try again
Seems stuck in a reboot loop:
ayounsi@re0.cr2-codfw> show chassis fpc 0 Temp CPU Utilization (%) CPU Utilization (%) Memory Utilization (%) Slot State (C) Total Interrupt 1min 5min 15min DRAM (MB) Heap Buffer 0 Present Absent {master} ayounsi@re0.cr2-codfw> show chassis fpc 0 Temp CPU Utilization (%) CPU Utilization (%) Memory Utilization (%) Slot State (C) Total Interrupt 1min 5min 15min DRAM (MB) Heap Buffer 0 Offline ---Unresponsive---
Opening a JTAC case for possible RMA.