Page MenuHomePhabricator

labvirt1005 doesn't boot up
Closed, ResolvedPublic

Description

ALERT! /dev/disk/by-uuid/861a4750-9243-4da7-b566-8c3cebfd6114 does not exist. Dropping to a shell! on reboot for T99738

Event Timeline

yuvipanda raised the priority of this task from to High.
yuvipanda updated the task description. (Show Details)
yuvipanda added a subscriber: yuvipanda.

What's the status of this? Is it blocked on someone outside the Labs team?

Ugh, this fell through the cracks :|

Ideally, someone will investigate ways to get this machine booting up on a kernel that's new enough to not have the memory issues that @BBlack pointed out - and ideally that'd be someone with more kernel expertise than people in the labs team. If we can't find anyone matching that description in the meantime, we should just bring this back up on an older kernel and keep it up as a lifeboat.

I've asked for help in the ops@ list again.

@Andrew says that similar issues had cropped up in another machine before, and a rollback to an older kernel fixed it.

The memory issues were just a random guess, not real evidence. I do think getting on newer kernels is probably a win in general, though. The alerts about not finding disks.... is this generic to all boxes on certain distros that we upgrade to certain kernels? Something to do with scsi/udev/etc timeouts on boot before LVM tries to find volumes?

I've rebooted the box into a 3.13 kernel now -- Moritz (or whoever) can now log in and investigate.

I changed the boot via grub; probably on the next reboot it'll go back to the broken 3.16 boot.

I checked the initrd for 3.16-0-38-generic and it misses many drivers which are present in 3.13.0-49. Most importantly the hpsa RAID controller driver.
(initrds are just gzip-compressed cpio archives)

Initially I blamed a broken initramfs-backport in Ubuntu, but after some digging it turned out Ubuntu moved many of the drivers into the linux-image-extra-3.16.0-38-extra package. I've installed that one and re-ran

update-initramfs -u -k 3.16.0-38-generic

and now all drivers are present again.

yuvipanda claimed this task.

Thanks Moritz!