db2069 crashed due to RAID controller
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jcrespo
	Jul 29 2016, 6:49 AM

Description

The IPMI interface is available, connecting from console gives some kernel messages:

[15108735.009963] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
[15108760.549599] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 17, t=5488344 jiffies, g=236130750, c=236130749, q=99360)
[15108823.642644] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 11, t=5504100 jiffies, g=236130750, c=236130749, q=99517)
[15108886.735680] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 31, t=5519852 jiffies, g=236130750, c=236130749, q=99680)

This looks like the last time, when the RAID controller got disconnected on another host.

Event Timeline

jcrespo created this task.Jul 29 2016, 6:49 AM

Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald TranscriptJul 29 2016, 6:49 AM

jcrespo updated the task description. (Show Details)Jul 29 2016, 6:52 AM

And I was right:

</system1/log1>hpiLO-> show record5

status=0
status_tag=COMMAND COMPLETED
Fri Jul 29 06:35:23 2016



/system1/log1/record5
  Targets
  Properties
    number=5
    severity=Critical
    date=07/29/2016
    time=00:26
    description=Drive Array Controller Failure (Slot 0)
  Verbs
    cd version exit show set

Mentioned in SAL [2016-07-29T06:59:09Z] <jynus> powercycling db2069 T141601

Replication went back to normal and GTID reliable replication avoided corruption- resolving, keeping an eye on the data.

db2069 crashed due to RAID controllerClosed, ResolvedPublicActions

Description

Event Timeline

db2069 crashed due to RAID controller
Closed, ResolvedPublic
Actions