There is a regression in the handling discard/TRIM on RAID 10 software RAID, which leads to soft lockups: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460
These hosts are running 6.1.135 with bookworm and RAID10. They needs to be rebooted into linux-image-6.1.0-33-amd64 (6.1.133): Once -33- is running, we can uninstall -34-
- centrallog2002.codfw.wmnet
- centrallog1002.eqiad.wmnet
- vrts2002.codfw.wmnet
- vrts1003.eqiad.wmnet
- prometheus2005.codfw.wmnet
- prometheus2006.codfw.wmnet
- prometheus2007.codfw.wmnet
- prometheus2008.codfw.wmnet
- prometheus1005.eqiad.wmnet
- prometheus1006.eqiad.wmnet
- prometheus1007.eqiad.wmnet
- prometheus1008.eqiad.wmnet
Once a fixed kernel is out, these can be reverted to the latest Bookworm kernel again.
These Bookworm hosts use software RAID10, but have not yet rebooted into 6.1.135, on these we only need to uninstall the 6.1.135 kernel.
- cloudnet2005-dev.codfw.wmnet
- cloudnet2006-dev.codfw.wmnet
- cloudnet2007-dev.codfw.wmnet
- cloudnet2008-dev.codfw.wmnet
- cloudnet1005.eqiad.wmnet
- cloudnet1006.eqiad.wmnet
- cloudrabbit1001.eqiad.wmnet
- cloudrabbit1002.eqiad.wmnet
- cloudrabbit1003.eqiad.wmnet
- cloudservices2004-dev.codfw.wmnet
- cloudservices2005-dev.codfw.wmnet
- cloudservices1005.eqiad.wmnet
- cloudservices1006.eqiad.wmnet
- puppetserver2001.codfw.wmnet
- puppetserver2002.codfw.wmnet
- puppetserver2003.codfw.wmnet
- puppetserver2004.codfw.wmnet
- puppetserver1001.eqiad.wmnet
- puppetserver1002.eqiad.wmnet
- puppetserver1003.eqiad.wmnet