Page MenuHomePhabricator

testing: r430 server / h800 controller / md1200 shelf
Closed, ResolvedPublic

Description

We are looking to order more labstore machines, but Dell states that we cannot use an older H800 controller card in the new R430 line to drive the older MD1200 shelves. They advise we need to use the new controller, and then a third party, unsupported cable.

Rather than simply take their word for this, we have the hardware already to test it out. (This was previously discussed between @RobH and @Cmjohnson via irc to confirm that last statement.)

As onsite, @Cmjohnson will do the following:

  • Allocate a new spare R430 system temporarily for this test.
  • Install an H800 controller card into a spare R430 system.
  • Install/setup system with jessie
  • attempt to use disk shelf within the OS
  • outline results and tests

Successful Test:

  • Leave system online so other opsen can run additional tests.

Failed Test:

  • Leave system attached and ensure there are no other tests that can be performed before returning to spares.
  • Once confirmed there are no other tests, wipe and return parts to spares.

This task is a blocker for hardware-requests T126089.

Event Timeline

RobH created this task.Feb 19 2016, 6:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 19 2016, 6:10 PM
RobH updated the task description. (Show Details)Feb 19 2016, 6:13 PM

robh: do you want to allocate the server or do you want me to borrow one from spares?
cmjohnson1: just take one that doesnt have a note saying possible allocation on another task
robh you can leave on spares trackign page even, it'll be yours for a very short time i think
robh just note its use on the notes field

RobH mentioned this in Unknown Object (Task).Feb 19 2016, 7:15 PM
RobH added a parent task: Unknown Object (Task).
RobH updated the task description. (Show Details)Feb 22 2016, 9:41 PM

Chris: Please note that I changed the process; depending if the test succeeds or fails.

Successful Test:

  • Leave system online so other opsen can run additional tests.

Failed Test:

  • Leave system attached and ensure there are no other tests that can be performed before returning to spares.
RobH updated the task description. (Show Details)Feb 22 2016, 9:43 PM

Okay, I will update ticket once completed

RobH added a comment.Feb 23 2016, 12:10 AM

IRC Update:

We should also test using a 6Ghps external SAS controller in an HP DL360. Once the new restbase systems arrive and restbase1001-1006 goes to spare, we can use one of them for this test.

The test would ideally try both using the Dell H810 and the 3ware SAS 9750-8e. The outcome of that would determine if we can use HP systems with Dell disk shelves (this should work.)

The card will not work for the server. The controller card is too large for the space and will not fit.

a test server has been established and accessible via ssh
wmf4727-test.eqiad.wmnet

Controller BIOS needs to be added to use the LSI controller.

@RobH: I cannot get the controller bios to install..any suggestions?

RobH added a comment.Mar 7 2016, 11:09 PM

IRC Update: Chris and I chatted about this and all the steps he has taken would typically suffice. The system detects the card, but otherwise isn't working.

It is a random third party controller card we had, so it may itself be suspect. So we would need to purchase a known good third party controller for this test.

Considering the labstore specification seems to have shifted from using shelves, to using internal storage, this may no longer be a vital test.

Verdict here is 'no go'?

@chasemp Verdict is no go!

Cmjohnson closed this task as Resolved.Mar 25 2016, 3:16 PM
  1. We would have to use a non-Dell branded low-profile RAID card
  2. The LSI card i did attempt to use did was not recognized by the server and my attempts to install firmware failed.

I chalk this up to not going to work easily and we would be better off buying the latest versions Dell or HP offers.

faidon reopened this task as Open.Mar 29 2016, 12:15 PM
faidon added a subscriber: faidon.

Re-opening, as that server is still allocated, running and in-puppet — please keep this or another task open to cleanup (and cleanup! :))

I gave it a look, BTW: the (random, as already mentioned) card that is installed right now is not an LSI — it's a 3ware RAID controller:

root@wmf4727-test:~# lspci  | grep RAID
04:00.0 RAID bus controller: 3ware Inc 9750 SAS2/SATA-II RAID PCIe (rev 05)

The 3wares won't work with megacli etc.; we'll need tw-cli for that, which I test-installed on the server:

root@wmf4727-test:~# tw-cli show

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c0    9750-8e      2         2        0       0       1       1      -        

Enclosure     Slots  Drives  Fans  TSUnits  PSUnits  Alarms   
--------------------------------------------------------------
/c0/e0        12     2       4     4        2        1

It /is/ possible to deal with this card and add support for it in check-raid etc., but if this is a one-off, let's please not and stick with our well-known ones.

Cmjohnson reassigned this task from Cmjohnson to RobH.Apr 15 2016, 1:39 PM

This card is a one-off and I agree we should stick with what we know. Assigning back to @RobH to comment or resolve.

RobH closed this task as Resolved.Apr 19 2016, 7:03 PM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 7:03 PM
MoritzMuehlenhoff reopened this task as Open.May 4 2016, 11:04 AM

Reopening the ticket, since wmf4727-test.eqiad.wmnet is still up and running and hooked into puppet and salt (it's not in site.pp, though). It should be properly removed.

Restricted Application added a subscriber: Southparkfan. · View Herald TranscriptMay 4 2016, 11:04 AM
RobH reassigned this task from RobH to Cmjohnson.May 4 2016, 4:04 PM

I'm assigning this to Chris, since he spun up the test and will have to do the disk wipe. Since it seems there is agreement to not use odd third party controllers, we don't need this test system online.

Chris: please decom the test machine you used for this, thanks!

Cmjohnson closed this task as Resolved.May 5 2016, 3:31 PM

Removed all production dns entires,
Removed dhcpd and netboot.cfg

Releases in Google doc as a spares...resolving task

MoritzMuehlenhoff reopened this task as Open.May 6 2016, 3:15 PM

But the server is still up (and in puppet/salt)?

root@neodymium:~# salt wmf* cmd.run 'uptime'
wmf4727-test.eqiad.wmnet:

15:12:11 up 71 days, 17:56,  0 users,  load average: 0.24, 0.18, 0.15
faidon added subscribers: akosiaris, Zppix.

Folks, having a wmfNNNN server set up like that for such a long time and not being cleaned up properly is a problem for security and general maintenance reasons. Let's fix it ASAP and not repeat such a thing :)

Cmjohnson closed this task as Resolved.May 26 2016, 4:51 PM

Removed from puppet, salt and wiped disks. The error was mine, I for some reason didn't think it was ever installed.