Page MenuHomePhabricator

swift - ms-be2036 - device sdg:4 unavailable
Closed, ResolvedPublic

Description

Here is another one like T291896 but on ms-be2036:

<+icinga-wm> PROBLEM - Check systemd state on ms-be2036 is CRITICAL: CRITICAL - degraded: The following units failed: swift-drive-audit.service

Sep 28 18:01:04 ms-be2036 drive-audit[25483]: Errors found but device unavailable: sdg:4

Event Timeline

Dzahn renamed this task from swift - ms-be2036 - device sdi:6 unavailable to swift - ms-be2036 - device sdg:4 unavailable.EditedTue, Sep 28, 6:12 PM
Joe triaged this task as High priority.Wed, Sep 29, 6:32 AM
Joe added a subscriber: Joe.

For the record, sdg has many bad sectors (according to kern.log) and should probably be substituted. Nit sure why the alert only fired yesterday though, the bad sector errors have been ongoing for quite some time.

@Papaul please replace the 4TB drive, should be blinking, thank you!

Papaul claimed this task.

@fgiunchedi disk replaced

Icinga all green

Nit sure why the alert only fired yesterday though, the bad sector errors have been ongoing for quite some time.

We notice it now because just the other day we switched those from crons to timers, which now triggers the generic systemd icinga check.