Page MenuHomePhabricator

WMF4727 hardware issue - disks dont detect in installer
Closed, ResolvedPublic

Description

In testing hosts trying to get deploy1001 (T175288) and bast1002, this host was tapped for testing. When @Dzahn attempted to install the OS, it doesn't see the disks.

The disks show in bios, but the OS fails, so this may be another issue of bad disks.

@Cmjohnson: please run a live cd and test the disks in this host.

Details

Related Gerrit Patches:

Event Timeline

RobH triaged this task as Medium priority.Mar 15 2018, 5:40 PM
RobH created this task.
Restricted Application added a project: Operations. · View Herald TranscriptMar 15 2018, 5:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Dzahn awarded a token.Mar 15 2018, 7:01 PM

Replaced both disks @RobH please close this task once confirmed issues has been resolved

Return shipping for both disk in one box

9202 3946 5301 2438 2758 56
9611918 2393026 75025877

Replaced the second disk @RobH please close this task once confirmed issues has been resolved

Return Shipping
9302 3946 5301 2438 2536 63
9611918 2393026 75003684

Dzahn added a comment.Mar 19 2018, 7:12 PM

Can/Should i just try to install this one more time, as bast1003, @RobH, or are you already on it?

Dzahn added a comment.Mar 19 2018, 7:12 PM

Its MAC should still be in install_server config.

Dzahn claimed this task.Mar 19 2018, 7:15 PM
Dzahn reassigned this task from Dzahn to RobH.Mar 19 2018, 11:39 PM

Please also see the duplicate task i first created and then merged in (T190093)

The disks are showing up on the server, I can confirm all 4 disks are being seen during post, they're all green and in the SATA settings they are all on with the correct disks. The mode is set to AHCI

Dzahn added a comment.Mar 20 2018, 7:34 PM

@RobH I'm thinking next we should just try repeating the install one more time and see if grub install still fails and if it still does we can try with jessie instead of stretch.

Dzahn added a comment.Mar 21 2018, 9:47 PM

per IRC talk, this already worked when Rob live-hacked for testing. it just needs to be reclaimed.

i just reverted my DNS change in https://gerrit.wikimedia.org/r/#/c/421181/ so there is no more bast1003 and wmf4727 is a spare like before.

done from my side

RobH reassigned this task from RobH to Cmjohnson.Mar 21 2018, 10:29 PM

Added back to spares. The only thing that has to happen is a disk wipe, since these got installed with the new install key.

System is powered off and switch port disabled.

Assinging this to Chris for disk wipe, then he can simply resolve. (I've already added it back to spares tracking.)

Cmjohnson closed this task as Resolved.Mar 22 2018, 3:36 PM

Disks are wiped -- resolving

Dzahn reopened this task as Open.May 31 2018, 11:40 PM

reopening. got this same box assigned as a spare for something different and it has the same issue

Dzahn added a comment.EditedMay 31 2018, 11:46 PM

Issue is back with a new host name phab1002 on same hardware in T196019#4247544

It's possible that the issue Rob described on this ticket isn't identical what i describe above, but mine is just like T190093 and we closed that as duplicate of this in the past.

Change 436702 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

Change 436702 merged by Dzahn:
[operations/puppet@production] install_server: use raid1-gpt partman recipe for phab1002

https://gerrit.wikimedia.org/r/436702

Dzahn closed this task as Resolved.Jun 1 2018, 12:34 AM

problem is gone with raid1-gpt partman recipe which supports disks over 2 TB (thanks Papaul for pointing it out and the diff to 1TB disks in phab1001)