Page MenuHomePhabricator

es2031 crashed (es2)
Closed, ResolvedPublic

Description

es2031 (es2) just crashed and got rebooted.
It seems to me this is an hardware issue related to a disk or disk slot:

from racadm getsel
-------------------------------------------------------------------------------
Record:      2
Date/Time:   06/06/2022 12:36:35
Source:      system
Severity:    Critical
Description: A bus fatal error was detected on a component at slot 4.
-------------------------------------------------------------------------------
Record:      3
Date/Time:   06/06/2022 12:36:35
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      4
Date/Time:   06/06/2022 12:36:35
Source:      system
Severity:    Critical
Description: A fatal error was detected on a component at bus 174 device 0 function 0.
-------------------------------------------------------------------------------
Record:      5
Date/Time:   06/06/2022 12:36:35
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
from racadm lclog view
--------------------------------------------------------------------------------
SeqNumber       = 351
Message ID      = SYS1003
Category        = Audit
AgentID         = DE
Severity        = Information
Timestamp       = 2022-06-06 12:37:07
Message         = System CPU Resetting.
FQDD            = iDRAC.Embedded.1#HostPowerCtrl
--------------------------------------------------------------------------------
SeqNumber       = 350
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 11 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 11 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.11:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 349
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 10 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 10 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.10:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 348
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 9 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 9 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.9:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 347
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 8 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 8 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.8:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 346
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 7 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 7 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.7:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 345
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 6 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 6 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.6:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 344
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 5 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 5 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.5:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 343
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 4 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 4 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.4:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 342
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 3 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 3 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.3:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 341
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 2 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 2 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.2:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 340
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 1 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 1 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.1:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 339
Message ID      = PDR8
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 0 in Backplane 1 of RAID Controller in Slot 4 is inserted.
Message Arg   1 = Disk 0 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.0:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 338
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 11 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 11 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.11:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 337
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 10 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 10 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.10:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 336
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 9 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 9 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.9:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 335
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 8 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 8 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.8:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 334
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 7 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 7 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.7:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 333
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 6 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 6 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.6:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 332
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 5 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 5 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.5:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 331
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 4 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 4 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.4:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 330
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 3 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 3 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.3:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 329
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 2 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 2 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.2:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 328
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 1 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 1 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.1:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 327
Message ID      = PDR4
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-06 12:37:06
Message         = Disk 0 in Backplane 1 of RAID Controller in Slot 4 returned to a ready state.
Message Arg   1 = Disk 0 in Backplane 1 of RAID Controller in Slot 4
FQDD            = Disk.Bay.0:Enclosure.Internal.0-1:RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 326
Message ID      = CPU9000
Category        = System
AgentID         = SEL
Severity        = Information
Timestamp       = 2022-06-06 12:36:37
Message         = An OEM diagnostic event occurred.
RawEventData    = 0x05,0x00,0x02,0xD3,0xF4,0x9D,0x62,0xB1,0x00,0x04,0xC1,0x1A,0x7E,0x01,0x4C,0x20

FQDD            = System.Embedded.1
--------------------------------------------------------------------------------
SeqNumber       = 325
Message ID      = PCI1318
Category        = System
AgentID         = SEL
Severity        = Critical
Timestamp       = 2022-06-06 12:36:36
Message         = A fatal error was detected on a component at bus 174 device 0 function 0.
Message Arg   1 = 174
Message Arg   2 = 0
Message Arg   3 = 0
RawEventData    = 0x04,0x00,0x02,0xD3,0xF4,0x9D,0x62,0xB1,0x00,0x04,0x13,0x38,0x6F,0xAC,0x00,0xAE

FQDD            = PCI.Embedded.1
--------------------------------------------------------------------------------
SeqNumber       = 324
Message ID      = CPU9000
Category        = System
AgentID         = SEL
Severity        = Information
Timestamp       = 2022-06-06 12:36:36
Message         = An OEM diagnostic event occurred.
RawEventData    = 0x03,0x00,0x02,0xD3,0xF4,0x9D,0x62,0xB1,0x00,0x04,0xC1,0x1A,0x7E,0x00,0x00,0x01

FQDD            = System.Embedded.1
--------------------------------------------------------------------------------
SeqNumber       = 323
Message ID      = PCI1360
Category        = System
AgentID         = SEL
Severity        = Critical
Timestamp       = 2022-06-06 12:36:35
Message         = A bus fatal error was detected on a component at slot 4.
Message Arg   1 = 4
RawEventData    = 0x02,0x00,0x02,0xD3,0xF4,0x9D,0x62,0xB1,0x00,0x04,0x13,0x18,0x6F,0xAA,0x00,0x84

FQDD            = PCI.Embedded.1
--------------------------------------------------------------------------------
SeqNumber       = 322
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-04 09:03:43
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 321
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-06-04 02:59:22
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 320
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-28 08:59:37
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 319
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-28 02:59:30
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 318
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-21 09:06:52
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 317
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-21 02:59:33
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 316
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-14 08:58:13
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 315
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-14 02:59:37
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 314
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-07 09:05:14
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 313
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-05-07 02:59:42
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 312
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-30 09:10:47
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 311
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-30 02:59:45
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 310
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-23 09:24:14
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 309
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-23 02:59:50
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 308
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-16 08:59:32
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 307
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-16 02:59:55
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 306
Message ID      = CTL38
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-09 08:44:39
Message         = The Patrol Read operation completed for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------
SeqNumber       = 305
Message ID      = CTL37
Category        = Storage
AgentID         = iDRAC
Severity        = Information
Timestamp       = 2022-04-09 03:00:00
Message         = A Patrol Read operation started for RAID Controller in Slot 4.
Message Arg   1 = RAID Controller in Slot 4
FQDD            = RAID.Slot.4-1
--------------------------------------------------------------------------------

Event Timeline

Volans triaged this task as High priority.Jun 6 2022, 12:51 PM
Volans created this task.

Mentioned in SAL (#wikimedia-operations) [2022-06-06T12:59:24Z] <volans@cumin1001> dbctl commit (dc=all): 'es2031 crashed T309977', diff saved to https://phabricator.wikimedia.org/P29436 and previous config saved to /var/cache/conftool/dbconfig/20220606-125923-volans.json

Marostegui added subscribers: Papaul, Marostegui.

@Papaul can we contact Dell about this and get some advise? Checking the disk controller logs I haven't found anything relevant.

Change 803271 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2031: Disable notifications

https://gerrit.wikimedia.org/r/803271

As these hosts do not have replication, I am leaving MySQL stopped for now in case Papaul needs some reboots/firmware upgrade.
@Papaul if you need to power off or reboot this host, feel free to do it at your on convenience.

Change 803271 merged by Marostegui:

[operations/puppet@production] es2031: Disable notifications

https://gerrit.wikimedia.org/r/803271

	2022-06-06 12:36:35 	PCI1360 	A bus fatal error was detected on a component at slot 4.		
Log Sequence Number:
323
Detailed Description:
System performance may be degraded, or system may fail to operate.
Recommended Action:
Update component drivers and power cycle the system. If device is removable, re-install the device.

According to Dell, : "The error that you're seeing means that something across the PCI bus is having communication issues. I would recommend double checking that your PCI devices are up to date, as well as the BIOS and iDRAC firmware. You would also want to reseat the risers and PCI cards and to perform a power drain of the system. If it continues after that, there may be a device in the server that is an issue."

https://www.dell.com/support/manuals/en-us/integrated-dell-remote-access-cntrllr-8-with-lifecycle-controller-v2.00.00.00/eemi_13g-v1/pci-event-messages?guid=guid-b22e470e-adc2-4ef4-ac82-98df81dc1dff&lang=en-us

PCI1360

Message
    A bus fatal error was detected on a component at slot arg1 . 
Arguments

        arg1 = number

Detailed Description
    System performance may be degraded, or system may fail to operate. 
Recommended Response Action
    Cycle input power, update component drivers, if device is removable, re-install the device. 
Category
    System Health 
Subcategory
    PCI = PCI Device 
Severity
    Severity 1 (Critical)
Trap/EventID
    2417
LCD Message
    Bus fatal error on slot <number>. Reseat PCI card
Initial Default
    LC Log
Server Administrator Event ID
    Not Applicable
Server Administrator Trap ID
    Not Applicable

Firmware upgrade done for :

  • BIOS
  • IDRAC
  • Backplan1

Power drain on the server

@Marostegui we can repool the server for now after all the firmware upgrade according to Dell web site. if we do see the issue again we can open a case.

Thanks.

Sounds good Papaul, I will start MySQL again then

Mentioned in SAL (#wikimedia-operations) [2022-06-07T05:15:25Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Repool es2031 T309977', diff saved to https://phabricator.wikimedia.org/P29449 and previous config saved to /var/cache/conftool/dbconfig/20220607-051525-marostegui.json

Closing as fixed, we'll reopen if it crashes again.