asw-c-eqiad have rebooted on its own 4 times so far:
May 16 16:40:35 FPC 8 removed May 16 19:05:43 FPC 8 removed May 16 20:27:16 FPC 8 removed May 16 21:37:02 FPC 8 removed
this happen after logs indicating PEM1 flapping:
send: red alarm set, device Power Supply 41, reason FPC 8 PEM 1 is not powered
Current theory is that PEM0 is reporting as healthy but is not, and as PEM1 started flapping, is causing the switch member to reboot.
Production hosts on the switch stack:
cp1099 cp1055 cp1054 cp1053 cp1052 cp1051 cp1050 cp1049 cp1048 cp1047 cp1046 cp1045
paravoid> 2/4 of misc, 4/8 of text and 4/11 of upload are there.
The reboot also caused the following alarm:
Class Description Major Upgrade bank is empty or corrupted for FPC 8, please do standard upgrade sequence
Fix would mean re-applying Junos 11.4R6.5 and restarting the switch member. I don't believe the alarm currently causes production issue (I saw switches running fine for a long time with that error), but the fix would cause ~10-20min downtime.
Suggested 1st step fix for the reboot issue is to replace PEM1 and PEM0.