First sign of trouble was hang/failure of a routine authdns-update on baham. First symptom was acpi_pad locking up CPUs as in T123924 . We did "rmmod acpi_pad" which cleared those, but the system was still slow. Observed super-low CPU frequency (~175Mhz), as in T147905 . Rebooted machine and cpu freq was still stuck low. Rebooted again to BIOS, enabled HT (was disabled) and switched from our normal Performance Per Watt (OS) to Performance for CPU power management stuff. Booted up at reasonable CPU speeds and resumed normal service after that. Not sure where we want to go with this on investigating relation to the other past incidents on other machines and/or whether we think this is some kind of HW or iDRAC fault.
See also T101525 about making authdns infra more resilient against single machine failure.