Page MenuHomePhabricator

Rack/setup sodium (carbon/mirror server replacement)
Closed, ResolvedPublic

Description

  • - name mirror1001 (discussed via PM with @RobH and @Cmjohnson)
  • - setup network ports
  • - mgmt & production dns
  • - install_module update - dhcp lease file
  • - install_module update - partitioning and netboot.
  • - install OS (Jessie)
  • - puppet/salt key acceptance
  • - service implementation (hand off to @faidon for this step)

Event Timeline

Cmjohnson added a parent task: Unknown Object (Task).Jul 1 2016, 2:56 PM

This will require 10GbE, and also will be an apt-mirror, not our actual apt-server. (That wording was my mistake in earlier tasks.)

So it won't replace carbon entirely, but some of its service(s).

The details on what this system will do are on T137117. I suggest the following:

  • 10GbE rack
  • use element name (dont forget to parse site.pp for ganeti vms using element names)
  • rack anywhere in rows a-c where there is a free 10GbE rack (avoid D for its imminent switch stack upgrade)
  • public vlan
  • raid10 with jessie

No need to — I wouldn't expect more of those. An element name for it would be fine IMHO.

Mgmt and production DNS completed. I added to public vlan and assigned both ipv4 and ipv6.

Only thing missing at this point is preferred partitioning. Please let me know and I will update and install.

Configured w/Raid 10. Tried installing but no pxe device found.

Check cable needed.

Confirmed that the pxe is enabled on the 10G NIC, disabled the 1G NICS in bios. The servers is connected via fiber w/SFP+'s, going to remove and use a DAC cable.

faidon renamed this task from Rack/Setup Carbon/Apt Server Replacement to Rack/setup sodium (carbon/mirror server replacement).Jul 22 2016, 4:16 PM
faidon updated the task description. (Show Details)

So, this was simply the case of a misconfigured VLAN on the switch. I did that and with another small hack[1] managed to make the server install.

However, it is currently impossible for the server to boot off the disk — the BIOS simply doesn't list the virtual disk as an option in the boot sequence. Both @RobH and me tried a few different things.

The symptoms seem similar to this:
https://arstechnica.com/civis/viewtopic.php?f=21&t=1316257

I tried upgrading the controller's BIOS from 25.4.0.0015 to 25.4.1.0004 and after that tried downgrading to 25.2.2-0004 (and factory resetting the controller, as well as recreating the VD — as RAID5). None of these had any effect.

The next step would be to contact Dell about this — @Cmjohnson could you take care of this? Thanks!

1: The hack would be adding modprobe.blacklist=tg3 after ixgbe.allow_unsupported_sfp=1 and before console=ttyS1,115200n8 in carbon's /srv/tftpboot/jessie-installer/pxelinux.cfg/ttyS1-115200.

A workorder to replace the system board has been issued. Congratulations: Work Order SR933837812 was successfully submitted.

A new system board has been confirmed. Dell will be sending a tech out to me next week.

Your appointment has been scheduled for : 12:00 PM-05:00 PM , Wednesday, August 03, 2016.

Had a new system board installed but issue persists. Need more time to troubleshoot.

PowerEdge Expandable RAID Controller BIOS
Copyright(c) 2014 LSI Corporation
Press <Ctrl><R> to Run Configuration Utility
HA -0 (Bus 1 Dev 0) PERC H730 Mini
FW package: 25.2.2-0004

0 Non-RAID Disk(s) found on the host adapter
0 Non-RAID Disk(s) handled by BIOS

1 Virtual Drive(s) found on the host adapter.

0 Virtual Drive(s) handled by BIOS

Requested a new RAID Controller. Found that we're not the only one w/this problem https://arstechnica.com/civis/viewtopic.php?f=21&t=1316257

Cmjohnson added a parent task: Unknown Object (Task).Aug 11 2016, 2:59 PM

Replaced the broken cable, during post I am still getting the same message that the VD is not handled by bios

updated firmware for both bios and controller. Not able to see RAID in BIOS.

Created a new work order to have a technician come to the data center and troubleshoot.

Spoke with Dell support technician Robert Thaler today. We went over some things that were already one and he's also stumped by the issue. He did state that there has been numerous issues with the 4k drives. Dell's suggestion is to set it to HPA mode and software raid.

The RAID controller is actually useful and expensive. They sold us this system, in this configuration, with a RAID controller (w/ a BBU) and those specific disks. Can you circle back with them and demand they fix this for us? Cc: @RobH (in case we need to involve our sales rep too)

The dell tech did look and told me there are non 4k 6TB disks we could
use.
http://accessories.ap.dell.com/sna/productdetail.aspx?c=au&l=en&s=dhs&cs=audhs1&sku=400-ALDU

should we talk with Dell about exchanging them?

We'll need our Dell reps looped in, as they did sell us this config and it should work. In addition, we had to buy cables, since one broke diagnosing an issue that they saddled us with.

I've chatted with Chris about this via IRC. He is documenting all the steps taken by tech support into a cohesive email to send over to our Dell reps so they can get involved and solve this issue for us.

In regards to swapping for another 4TB disk(s): I have no preference, except that Dell sold us the config. They'll need to eat the costs of the swaps and need to be held accountable for selling us a bad configuration.

@faidon Would you be okay with 4TB disks instead of the 6TB disks we have now or would you want to go w/ SW raid?

4x4TB + HWRAID would be preferrable. In any case Dell should refund us the difference.

I wonder why @RobH and @Cmjohnson are talking about 4TB disks. The current problems are caused by 4k (6TB) disks, and the accessoires link given by Cmjohnson mentions a non-4k 6TB disk.

Have you been discussing the use of 4TB disks via another way (e.g. IRC), or is there confusion between 4TB (non-4k?) and 6TB non-4k disks?

I wonder why @RobH and @Cmjohnson are talking about 4TB disks. The current problems are caused by 4k (6TB) disks, and the accessoires link given by Cmjohnson mentions a non-4k 6TB disk.

Have you been discussing the use of 4TB disks via another way (e.g. IRC), or is there confusion between 4TB (non-4k?) and 6TB non-4k disks?

@Southparkfan The issue is with the 4k and 512e disks. However, in order to
replace them with something other than 4k and 512e disks we will need to
reduce the capacity from 6TB to 4TB. I hope that clears up any confusion

Still working on getting the disks replaced w/out any costs to us and possibly a refund. This is the latest message .

Chris,

Base on your request below return request order number 935064839, To return this order we submitted a FSR (Financial Services Request) as it is out of policy.

An FSR does not guarantee that the order will be approved for the return. Moreover, I will do all my effort to help on this request.

Request Id 9726888.

I have created a Service Request for this issue. Feel free to contact me if there is any question or doubt.

Regards,

Carlos Brown
Customer Care Analyst
Dell | Dell Business Operations
Carlos_Brown@DellTeam.com
Work Hours: Monday to Friday: 8am-6pm
Customer feedback | How am I doing? Please contact my manager Claudia_taboada@dell.com

  • Please do not remove your unique tracking number! ------

<<#3075-36464440#>>

Received the new disks from Dell, installed them set to RAID and the new disks are now handled by the BIOS.

faidon triaged this task as Medium priority.
faidon updated the task description. (Show Details)

Thanks Chris. I installed the system, reconfigured BIOS etc.; system is installed and up and running now.