Enabling graceful-switchover on cr1-codfw causes error in the logs and core dumps.
Opened Juniper case 2018-0403-0831.
Enabling graceful-switchover on cr1-codfw causes error in the logs and core dumps.
Opened Juniper case 2018-0403-0831.
Juniper's reply:
During the cleanup process, ksyncd will check for public nexthops to make sure that there are no public next hops remaining. If ksyncd finds a public nexthop hanging without getting cleaned up, it will set initialization error(KSYNCD_ERROR_INIT), which leads to this connection/initialization error. From the message logs, it looks like ksyncd is facing some issue during NH index cleanup, which looks suspicious. From the RSI, I can see that the FXP0 is in a logical system which is not supported and hence GRES is not completing correctly.
Please remove the fxp0 from the logical systems and then re-enable GRES.
I followed up as we have graceful-switchover enabled on routers with fxp0 in a logical-system.
Relevant KB entry: https://kb.juniper.net/InfoCenter/index?page=content&id=KB26616
JTAC's opinion on why it's working on some routers is that we're being "lucky".
Which raises the risk of a RE failure not being handled properly.
Best case being the RE going down and the redundant router taking all the load.
Worse case being a partial failure where the RE failover fails in a way traffic is blackholed.