Page MenuHomePhabricator

cloudelastic1002: SMART/disk error
Closed, ResolvedPublic

Description

There's some issue around cloudelastic1002 /dev/sdb device according to icinga: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudelastic1002&service=Device+not+healthy+-SMART-

onimisionipe@cloudelastic1002:~$ sudo smartctl -H /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-9-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
201 Power_Loss_Cap_Test     0x0033   001   001   010    Pre-fail  Always   FAILING_NOW 4 (82 6532)

Event Timeline

Mathew.onipe triaged this task as Medium priority.Aug 8 2019, 2:52 AM
Mathew.onipe created this task.
Cmjohnson moved this task from Backlog to Cloud Tasks on the ops-eqiad board.Aug 15 2019, 3:03 PM

Confirmed: Service Request 1005040764 was successfully submitted. ordered replacement drive

@mathew.onip Disk arrived

@Mathew.onipe @Gehel @aborrero can you sync with @Jclark-ctr when you have a minute to swap the disk?

Mentioned in SAL (#wikimedia-operations) [2019-12-05T20:11:32Z] <onimisionipe> ban cloudelastic1002 from shard allocation - T230088

replaced failed drive

onimisionipe@cloudelastic1002:~$ sudo smartctl -H /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-9-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

It looks good now. Thanks @Jclark-ctr

Mathew.onipe closed this task as Resolved.Dec 6 2019, 9:07 AM