Page MenuHomePhabricator

reinstall snapshot1001.eqiad.wmnet with RAID
Closed, ResolvedPublic

Description

snapshot servers dont have RAID configured per parent task

what does it take to change that.. are we talking hardware or software. .. and which RAID level would make sense

Event Timeline

I need to decommission 2 and 4. 3 will be decommissioned after the cron jobs are moved off of it, see T133694.

I'd like to keep 1 around for a while yet as a canary/testbed, it could be reinstalled any way we deem fit. It has 2 80GB HDs but I doubt it has a RAID controller so we'd be looking at SW raid. Also because those are smallish disks I don't know that we can use the standard snapshot partman recipe on it.

Dzahn removed Dzahn as the assignee of this task.Jul 20 2016, 11:48 PM
ArielGlenn renamed this task from reinstall snapshot100[1234].eqiad.wmnet with RAID to reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4.Jul 21 2016, 8:47 AM
ArielGlenn claimed this task.

snapshot1002,3,4 are now ready for decommissioning as all functionality has been moved off the misc cron host and nothing now runs on any of these hosts.

asking @RobH if you know if we have any hosts with Perc H200 running rAID (I guess RAID 1 would do), and if so, any gotchas I need to look out for? snapshot1001 has an H200 and two disks which we want to raid up, the question is whether that's best in HW or software for the given box. Thanks.

RobH added a comment.Aug 1 2016, 5:08 PM

So I looked in a few places:

http://www.dell.com/learn/us/en/04/campaigns/dell-raid-controllers

The H200 seems very, very much like the H300/H310. Considering that the H310 isn't good enough, and its a newer line, along with the H200's lack of BBU or onboard RAM, makes me think that using the H200 in anything other than a disk controller for JBOD/sw raid may result in sub-par performance.

We have 8 H710 controllers on the spare shelf, purchased to swap out any aging H310 systems we have in use. The h200 may be able to be replaced with one of these as well.

I'd suggest sw raid, or replacing with the H710 controller spares. Using the H200 as a full fledged raid controller may result in sub-optimal performance (as it did with the H310). Since we don't have any other H200 controllers in use (that I am aware of), I cannot be certain of the above.

No need to replace, sw raid should be just fine for this. Thanks for the info!

The H200 seems very, very much like the H300/H310. Considering that the H310 isn't good enough, and its a newer line, along with the H200's lack of BBU or onboard RAM, makes me think that using the H200 in anything other than a disk controller for JBOD/sw raid may result in sub-par performance.

No opinion on the H200 yet but just a note for what it's worth:

H310 is a very broken controller indeed. We are not entirely sure of what's wrong with the H310 internally, but it's more than likely that is not purely related to the underlying LSI chipset. Other controllers with the same chipset are known to be working OK. Whatever is wrong with it is likely specific to the H310 (either its design, its firmware, or both). We have no evidence to suggest that whatever is broken in the H310 carries along to other similar controllers like the H200.

@faidon What would you suggest? Performance is not critical for this host, as it's going to be a canary/testbed, but it would be nice not to rely on HW that is actually broken. Is it worth it to use HW RAID to learn more about this performance of this chipset, or do we not have any other servers with H200s to which such learning would be applied?

Peachey88 renamed this task from reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 to reinstall snapshot1001.eqiad.wmnet with RAID.Aug 2 2016, 12:10 PM
ArielGlenn closed this task as Resolved.Oct 26 2016, 9:02 AM

Given that this is a testbed host, we can afford to experiment. I've installed with HW RAID using the H200. We'll see how that goes.