Page MenuHomePhabricator

On WMCS linux-perf must be installed from backports to be in sync with linux-image package
Closed, ResolvedPublic

Description

On a Bullseye instance:

$ perf
/usr/bin/perf: line 13: exec: perf_6.1: not found

That is because we have linux-image from backport but linux-perf is not:

$ apt list --installed 'linux-image*' 'linux-perf*'
linux-image-5.10.0-23-cloud-amd64/now 5.10.179-3 amd64 [installed,local]
linux-image-6.1.0-0.deb11.7-cloud-amd64/now 6.1.20-2~bpo11+1 amd64 [installed,local]
linux-image-cloud-amd64/now 6.1.20-2~bpo11+1 amd64 [installed,upgradable to: 6.1.90-1~bpo11+1]
linux-perf-5.10/oldstable-security,now 5.10.234-1 amd64 [installed,automatic]
linux-perf/oldstable-security,now 5.10.234-1 amd64 [installed]
$ apt-cache policy linux-image-cloud-amd64
linux-image-cloud-amd64:
  Installed: 6.1.20-2~bpo11+1
  Candidate: 6.1.90-1~bpo11+1
  Version table:
     6.1.90-1~bpo11+1 100
        100 http://mirrors.wikimedia.org/debian bullseye-backports/main amd64 Packages
 *** 6.1.20-2~bpo11+1 100
        100 /var/lib/dpkg/status
     5.10.234-1 500
        500 http://security.debian.org/debian-security bullseye-security/main amd64 Packages
     5.10.223-1 500
        500 http://mirrors.wikimedia.org/debian bullseye/main amd64 Packages
$ apt-cache policy linux-perf
linux-perf:
  Installed: 5.10.234-1
  Candidate: 5.10.234-1
  Version table:
     6.1.94-1~bpo11+1 100
        100 http://mirrors.wikimedia.org/debian bullseye-backports/main amd64 Packages
 *** 5.10.234-1 500
        500 http://security.debian.org/debian-security bullseye-security/main amd64 Packages
        100 /var/lib/dpkg/status
     5.10.223-1 500
        500 http://mirrors.wikimedia.org/debian bullseye/main amd64 Packages

linux-perf comes from Puppet base::standard_packages.

I can not find any apt configuration or preference to have linux-image installed from backports, hence this task.

Event Timeline

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!

The instance that got recently created run on 5.10.0 while the older are on 6.1.0:

$ sudo cumin --force 'name:docker' 'uname -r'
26 hosts will be targeted:
integration-agent-docker-[1040-1057,1059-1065].integration.eqiad1.wikimedia.cloud,integration-agent-puppet-docker-1003.integration.eqiad1.wikimedia.cloud
FORCE mode enabled, continuing without confirmation
===== NODE GROUP =====                                                                                                                                                                        
(1) integration-agent-docker-1044.integration.eqiad1.wikimedia.cloud                                                                                                                          
----- OUTPUT of 'uname -r' -----                                                                                                                                                              
6.1.0-0.deb11.21-cloud-amd64                                                                                                                                                                  
===== NODE GROUP =====                                                                                                                                                                        
(6) integration-agent-docker-[1060-1065].integration.eqiad1.wikimedia.cloud                                                                                                                   
----- OUTPUT of 'uname -r' -----                                                                                                                                                              
5.10.0-34-cloud-amd64                                                                                                                                                                         
===== NODE GROUP =====                                                                                                                                                                        
(2) integration-agent-docker-1059.integration.eqiad1.wikimedia.cloud,integration-agent-puppet-docker-1003.integration.eqiad1.wikimedia.cloud                                                  
----- OUTPUT of 'uname -r' -----                                                                                                                                                              
5.10.0-33-cloud-amd64                                                                                                                                                                         
===== NODE GROUP =====                                                                                                                                                                        
(17) integration-agent-docker-[1040-1043,1045-1057].integration.eqiad1.wikimedia.cloud                                                                                                        
----- OUTPUT of 'uname -r' -----                                                                                                                                                              
6.1.0-0.deb11.7-cloud-amd64                                                                                                                                                                   
================

My guess is the linux-image one was manually installed from backports.

On integration-agent-docker-1044:

reboot   system boot  6.1.0-0.deb11.7- Tue Jul  4 14:30:48 2023 - Tue Jul  4 14:32:14 2023  (00:01)
hashar   pts/0        172.16.3.145     Tue Jul  4 14:20:41 2023 - Tue Jul  4 14:30:41 2023  (00:10)
root     ttyS0                         Tue Jul  4 14:18:52 2023 - down                      (00:11)
root     ttyS0                         Tue Jul  4 14:16:31 2023 - Tue Jul  4 14:18:52 2023  (00:02)
reboot   system boot  6.1.0-0.deb11.7- Tue Jul  4 14:16:15 2023 - Tue Jul  4 14:30:42 2023  (00:14)
root     ttyS0                         Thu Jun  8 15:28:43 2023 - down                      (00:00)
reboot   system boot  5.10.0-23-cloud- Thu Jun  8 15:22:11 2023 - Thu Jun  8 15:29:14 2023  (00:07)

And from my shell history:

 1  2023-07-04 14:25:57 sudo rm -fR /var/lib/puppet/ssl && sudo puppet agent -tv
 2  2023-07-04 14:27:58 lsblk
 3  2023-07-04 14:28:12 sudo puppet agent -tv
 4  2023-07-04 14:28:38 sudo lsblk
 5  2023-07-04 14:28:42 sudo lvdisplay 
 6  2023-07-04 14:30:27 df -h
 7  2023-07-04 14:30:37 sudo dmesg -T|grep sd
 8  2023-07-04 14:30:41 sudo reboot-host
 9  2023-07-04 14:31:54 lsblk
10  2023-07-04 14:32:13 sudo reboot-host
11  2023-07-04 14:32:42 sudo puppet agent -tv
12  2023-07-04 14:35:55 lsblk
13  2023-07-04 14:36:14 sudo apt -y dist-upgrade
14  2023-07-04 14:36:24 sudo apt-get autoremove --purge
15  2023-07-04 14:37:04 sudo reboot-host

So my guess is dist-upgrade caused the package to be installed from backports somehow.

This is the runbook I have been using:

# Find grub entries based on https://wiki.debian.org/GrubReboot
sudo awk -F\'\|\" '/(submenu|menuentry) / { printf "%s\t%s\n", $1, $2}' /boot/grub/grub.cfg

# Reboot to 5.x kernel (usually entry #2)
sudo grub-reboot '1>2' && sudo systemctl reboot

# Downgrade and purge
sudo apt-get install -y --allow-downgrades -V linux-image-cloud-amd64/bullseye-security
sudo apt purge -V -y 'linux-image-6*'

# Apply upgraded 5.x upgrade
sudo systemctl reboot

All the Docker agents are back to 5.10.0 kernels:

$ sudo cumin --force 'name:docker' 'uname -r'
26 hosts will be targeted:
integration-agent-docker-[1040-1057,1059-1065].integration.eqiad1.wikimedia.cloud,integration-agent-puppet-docker-1003.integration.eqiad1.wikimedia.cloud
FORCE mode enabled, continuing without confirmation
===== NODE GROUP =====                                                                                                                                                                        
(2) integration-agent-docker-1059.integration.eqiad1.wikimedia.cloud,integration-agent-puppet-docker-1003.integration.eqiad1.wikimedia.cloud                                                  
----- OUTPUT of 'uname -r' -----                                                                                                                                                              
5.10.0-33-cloud-amd64                                                                                                                                                                         
===== NODE GROUP =====                                                                                                                                                                        
(17) integration-agent-docker-[1040,1042,1044,1046,1051-1057,1060-1065].integration.eqiad1.wikimedia.cloud                                                                                    
----- OUTPUT of 'uname -r' -----                                                                                                                                                              
5.10.0-34-cloud-amd64                                                                                                                                                                         
===== NODE GROUP =====                                                                                                                                                                        
(7) integration-agent-docker-[1041,1043,1045,1047-1050].integration.eqiad1.wikimedia.cloud                                                                                                    
----- OUTPUT of 'uname -r' -----                                                                                                                                                              
5.10.0-23-cloud-amd64