Page MenuHomePhabricator

Order spare cloudvirt SSDs for eqiad
Closed, DeclinedPublic


We've had a few scares with failing SSDs in cloudvirts -- if we were to lose the wrong two drives in a row we'd suffer loss of user data.

The prudent thing is probably to keep spares on hands so we can replace things as they fail and avoid having two overlapping failures. Unfortunately there's quite a variety of drive sizes in the cloudvirts so this will be several different drives we'll have to keep around.

This task is part of

Related Objects


Event Timeline

Andrew created this task.Feb 13 2019, 2:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 13 2019, 2:25 PM
Andrew added a subscriber: GTirloni.EditedFeb 13 2019, 2:31 PM

@GTirloni suggests that we add a live spare to each cloudvirt to avoid data loss. Seems like a good idea, although in many cases we won't have spare drive bays for this.

aborrero triaged this task as High priority.Feb 14 2019, 12:54 PM
aborrero added a project: Wikimedia-Incident.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.
aborrero moved this task from On-going to Follow-up on the Wikimedia-Incident board.
aborrero updated the task description. (Show Details)Feb 14 2019, 1:04 PM
Andrew added a comment.Mar 1 2019, 5:19 PM

I'm less sure that we need drives on hand now. We seem to be able to get replacements more-or-less overnight, and adding spare drives to the RAIDs will reduce the urgency of replacement.

GTirloni removed a subscriber: GTirloni.Mar 21 2019, 9:06 PM
marilerr closed this task as Declined.Aug 24 2019, 3:31 AM
JJMC89 reopened this task as Open.Aug 24 2019, 3:34 AM
Andrew closed this task as Declined.Sep 11 2019, 3:31 PM