Page MenuHomePhabricator

WMF4727 hardware issue - disks dont detect in installer
Closed, ResolvedPublic

Description

In testing hosts trying to get deploy1001 (T175288) and bast1002, this host was tapped for testing. When @Dzahn attempted to install the OS, it doesn't see the disks.

The disks show in bios, but the OS fails, so this may be another issue of bad disks.

@Cmjohnson: please run a live cd and test the disks in this host.

Event Timeline

RobH triaged this task as Medium priority.Mar 15 2018, 5:40 PM
RobH created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Replaced both disks @RobH please close this task once confirmed issues has been resolved

Return shipping for both disk in one box

9202 3946 5301 2438 2758 56
9611918 2393026 75025877

Replaced the second disk @RobH please close this task once confirmed issues has been resolved

Return Shipping
9302 3946 5301 2438 2536 63
9611918 2393026 75003684

Can/Should i just try to install this one more time, as bast1003, @RobH, or are you already on it?

Its MAC should still be in install_server config.

Please also see the duplicate task i first created and then merged in (T190093)

The disks are showing up on the server, I can confirm all 4 disks are being seen during post, they're all green and in the SATA settings they are all on with the correct disks. The mode is set to AHCI

@RobH I'm thinking next we should just try repeating the install one more time and see if grub install still fails and if it still does we can try with jessie instead of stretch.

per IRC talk, this already worked when Rob live-hacked for testing. it just needs to be reclaimed.

i just reverted my DNS change in https://gerrit.wikimedia.org/r/#/c/421181/ so there is no more bast1003 and wmf4727 is a spare like before.

done from my side

Added back to spares. The only thing that has to happen is a disk wipe, since these got installed with the new install key.

System is powered off and switch port disabled.

Assinging this to Chris for disk wipe, then he can simply resolve. (I've already added it back to spares tracking.)

Disks are wiped -- resolving

reopening. got this same box assigned as a spare for something different and it has the same issue

Issue is back with a new host name phab1002 on same hardware in T196019#4247544

It's possible that the issue Rob described on this ticket isn't identical what i describe above, but mine is just like T190093 and we closed that as duplicate of this in the past.

Change 436702 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

Change 436702 merged by Dzahn:
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

problem is gone with raid1-gpt partman recipe which supports disks over 2 TB (thanks Papaul for pointing it out and the diff to 1TB disks in phab1001)