Page MenuHomePhabricator

HP RAID Battery issue on elastic2004
Closed, ResolvedPublic

Description

Icinga reports an HP RAID error:

WARNING: Slot 0: OK: 1I:1:1, 1I:1:2 - Controller: OK - Battery/Capacitor: Recharging

This looks similar to T163777 and probably requires changing the batttery. @Papaul could you have a look please?

gehel@elastic2004:~$ sudo hpssacli controller slot=0 show detail

Smart Array P440ar in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Serial Number: PDNLH0BRH8RDNF
   Cache Serial Number: PDNLH0BRH8RDNF
   RAID 6 (ADG) Status: Enabled
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 2.52
   Rebuild Priority: High
   Expand Priority: Medium
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: Yes
   Current Parallel Surface Scan Count: 4
   Max Parallel Surface Scan Count: 16
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: True
   Cache Status: Not Configured
   Cache Ratio: 100% Read / 0% Write
   Read Cache Size: 0 MB
   Write Cache Size: 0 MB
   Drive Write Cache: Disabled
   Total Cache Size: 2.0 GB
   Total Cache Memory Available: 1.8 GB
   No-Battery Write Cache: Disabled
   SSD Caching RAID5 WriteBack Enabled: True
   SSD Caching Version: 2
   Cache Backup Power Source: Batteries
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: Recharging
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 43
   Cache Module Temperature (C): 38
   Number of Ports: 2 Internal only
   Encryption: Disabled
   Express Local Encryption: False
   Driver Name: hpsa
   Driver Version: 3.4.16
   Driver Supports HPE SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:03:00.0
   Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
   Controller Mode: RAID
   Controller Mode Reboot: Not Required
   Latency Scheduler Setting: Disabled
   Current Power Mode: MaxPerformance
   Host Serial Number: MXQ526082L
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None

Event Timeline

Gehel created this task.Nov 27 2017, 4:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Gehel I will be receiving the part tomorrow

Dear Mr Papaul Tshibamba,

Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below.

Your request is being worked on under reference number 5324959512
Status: Case is generated and in Progress

Product description: HPE ProLiant DL360 Gen9 E5-2640v3 2.6GHz 8-core 2P 16GB-R P440ar 8 SFF 500W RPS Server/S-Buy
Product number: 780019-S01
Serial number: MXQ526082L
Subject: DL360 Gen9 Server - Battery status recharging

Yours sincerely,

Gehel added a comment.Nov 27 2017, 5:24 PM

That's fast! Thanks! Ping me if you need me to be around when you install it.

I will because i have to update all the firmware too on the server. So will let you know tomorrow once I received the part

Thanks.

@Gehel Tracking information shows 10:30am CT as delivering time and it is almost 2pm. I contact UPS they let me know that due to the pass holidays the package will not be delivered until 7pm. We will have to postpone the replacement for tomorrow.

Gehel added a comment.Nov 28 2017, 7:31 PM

@Papaul you don't necessarily need me around for that. The only things I would do:

  • depool server
  • schedule downtime
  • shutdown server

Nothing you can't do on your own. But I'm happy to be around if you want!

@Gehel UPS just give me wrong timing> I just got email confirmation that i got the package on site.

can you go ahead and shut the server down please.

Thanks

Papaul reassigned this task from Papaul to Gehel.Nov 28 2017, 8:38 PM

Battery replacement complete
Firmware update complete

Mentioned in SAL (#wikimedia-operations) [2017-11-28T20:42:33Z] <gehel> repooling elastic2004 after RAID controller maintenance - T181412

maintenance has been done, icinga check is green again. We can close this.

debt closed this task as Resolved.Dec 8 2017, 8:46 PM