Page MenuHomePhabricator

baham (ns1) CPU-related issues
Closed, ResolvedPublic

Description

First sign of trouble was hang/failure of a routine authdns-update on baham. First symptom was acpi_pad locking up CPUs as in T123924 . We did "rmmod acpi_pad" which cleared those, but the system was still slow. Observed super-low CPU frequency (~175Mhz), as in T147905 . Rebooted machine and cpu freq was still stuck low. Rebooted again to BIOS, enabled HT (was disabled) and switched from our normal Performance Per Watt (OS) to Performance for CPU power management stuff. Booted up at reasonable CPU speeds and resumed normal service after that. Not sure where we want to go with this on investigating relation to the other past incidents on other machines and/or whether we think this is some kind of HW or iDRAC fault.

See also T101525 about making authdns infra more resilient against single machine failure.

Event Timeline

ema triaged this task as High priority.Mar 8 2017, 6:55 AM
ema moved this task from Backlog to Some old column on the Traffic board.

@BBlack do you think this should stay open as a separate task given the recent changes regarding acpi_pad and blacklisting? I realize the HT / BIOS thing is unrelated but might also be tracked in another place.

BBlack assigned this task to Dzahn.

I'll close it for now. If we see more strange issues with super-low cpu freqs we can always search these up to correlate I guess.