Page MenuHomePhabricator

cr1-eqiad Control Board error
Closed, ResolvedPublic

Description

After reseating PEM1 on cr1-eqiad. An alarm for Control Board 1 appeared.
Alarm time Class Description
2015-02-19 19:28:16 UTC Major CB 1 Failure

A ticket has been opened with Juniper. 2015-0219-0654

Event Timeline

Cmjohnson claimed this task.
Cmjohnson raised the priority of this task from to Unbreak Now!.
Cmjohnson updated the task description. (Show Details)
Cmjohnson subscribed.

That's likely this: http://kb.juniper.net/InfoCenter/index?page=content&id=KB26731.

We'll need to either do a:

  • a RE graceful switchover (traffic affecting, hopefully not much); or
  • restart chassis-control immediately which has "a small risk that this reconnect could fail, if the restart of the chassis takes longer than expected, which could lead to the FPC restarting and subsequent traffic loss".

Either way, let's see what TAC says. Chris, do not run any of these commands if they instruct you to; they are possibly traffic affecting.

faidon lowered the priority of this task from Unbreak Now! to High.Feb 19 2015, 8:46 PM
faidon added a project: netops.
faidon set Security to None.

This is the response I received from Juniper regarding the CB1 error. His suggestion.

Can you set up a MW to switch the CB mastership? The SCB contains circuitry that controls the PEMs. Only the master SCB does this and at this point SCB 1 is master. We can switch masterhip of SCB 1 to SCB 0 to see if it’s an issue with CB 0. Please only do this during a Maintenance window as it will cause traffic disruption:

  1. request chassis routing-engine master switch

Regards

Daniel

I ran restart chassis-control immediately and it's fixed now. Our current uptime is 747 days so we should be good for another 23 or so…