Page MenuHomePhabricator

buster reimaging broken with "No kernel modules found"
Closed, ResolvedPublic

Description

@MoritzMuehlenhoff updated the netinst image for the latest buster point release, i've run puppet on install*, i've run sudo -u mirror ftpsync on sodium, but i'm still getting "No kernel modules found":

image.png (274×665 px, 24 KB)

Event Timeline

From d-i syslog:

May 11 09:11:07 anna[5770]: WARNING **: no packages matching running kernel 4.19.0-8-amd64 in archive

It looks like maybe initrd.gz got updated, but not the kernel?

root@puppetmaster1001:/var/lib/puppet/volatile/tftpboot/buster-installer/debian-installer/amd64# ls -l
total 119816
-rw-r--r-- 1 root root   1322936 May  4 19:14 bootnetx64.efi
drwxrwxr-x 2 root root      4096 May  4 19:14 boot-screens
drwxrwxr-x 3 root root      4096 May  4 19:14 grub
-rw-r--r-- 1 root root   1254768 May  4 19:14 grubx64.efi
-rw-rw-r-- 1 root root 114772537 May 11 07:17 initrd.gz
-rw-r--r-- 1 root root   5278960 May  4 19:14 linux
-rw-r--r-- 1 root root     42430 May  4 19:14 pxelinux.0
drwxrwxr-x 2 root root      4096 May  4 19:14 pxelinux.cfg

This seems caused by the separation of apt1001 and the new buster-based install servers; puppet updates /srv/tftpboot on install1003/2003, but probably the reimage by Kormat received a stale version of /srv/tftpboot from apt1001. Adding @Dzahn

I can confirm that /srv/tftpboot on apt1001 is stale:

kormat@apt1001:/srv/tftpboot/buster-installer/debian-installer/amd64(0:0)$ ls -l
total 119796
-r--r--r-- 1 root root   1322936 Jun 27  2019 bootnetx64.efi
dr-xr-xr-x 2 root root      4096 Feb 10 07:30 boot-screens
dr-xr-xr-x 3 root root      4096 Nov 18 10:46 grub
-r--r--r-- 1 root root   1254768 Jul  3  2019 grubx64.efi
-r--r--r-- 1 root root 114759838 Feb 10 07:30 initrd.gz
-r--r--r-- 1 root root   5270768 Feb 10 07:30 linux
-r--r--r-- 1 root root     42430 Apr  9  2019 pxelinux.0
dr-xr-xr-x 2 root root      4096 Feb  5  2019 pxelinux.cfg

To unbreak current Buster installs it should be sufficient to replace /srv/tftpboot/buster-install on apt1001.wikimedia.org with a version from install1003 or install2003.

To fix this for good we can either

  • have /srv/tftpboot on apt1001 be populated from the volatile directory
  • introduce installserver aliases similar to the webproxy CNAME (it seems unclean to reuse this) and change d-i to fetch the install image from there

Change 595507 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] aptrepo: populate /srv/tftpboot from volatile also on APT_repo servers

https://gerrit.wikimedia.org/r/595507

Change 595507 merged by Dzahn:
[operations/puppet@production] aptrepo: populate /srv/tftpboot from volatile also on APT_repo servers

https://gerrit.wikimedia.org/r/595507

To fix this for good we can either

  • have /srv/tftpboot on apt1001 be populated from the volatile directory

I did this one with the puppet patch above and /srv/tftpboot has been populated from volatile on apt1001 now.

I tested it on backup1002 and this worked well. This can be closed

  • but I wonder if we should have a working group in improving the install and deb service, when it was split we discussed that the split was well intended, but it had some surprising consequences- and I think @Dzahn was surprised by this double dependency on these files. Maybe we can come up with a better split strategy?

I tested it on backup1002 and this worked well. This can be closed

  • but I wonder if we should have a working group in improving the install and deb service, when it was split we discussed that the split was well intended, but it had some surprising consequences- and I think @Dzahn was surprised by this double dependency on these files. Maybe we can come up with a better split strategy?

I think ultimately we should serve the tftpboot environment from the install servers, especially once we add install* servers to the Ganeti clusters in the edge PoPs. But that needs some changes to the install roles, so that they also have a web server etc.

Marostegui assigned this task to MoritzMuehlenhoff.
Marostegui triaged this task as Medium priority.
Marostegui added a subscriber: Marostegui.

Closing per: T252382#6124358
If we want a further discussion on long-term solving, we can always create a new task.

Thanks!

Change 595892 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] introduce sectools1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/595892

I tested it on backup1002 and this worked well. This can be closed

Thanks!

  • but I wonder if we should have a working group

I don't think that's needed. The discussion on the strategy was/is T242602 which can still be used.

And what Moritz said above " serve the tftpboot environment from the install servers, especially once we add install* servers to the Ganeti clusters in the edge PoPs" is already agreed on and will be implemented.