While debugging something else, found out a bunch of log entries on dmesg like:
[Mon Aug 9 02:43:37 2021] CPU63: Package temperature above threshold, cpu clock throttled (total events = 453033)
While debugging something else, found out a bunch of log entries on dmesg like:
[Mon Aug 9 02:43:37 2021] CPU63: Package temperature above threshold, cpu clock throttled (total events = 453033)
There were some logs that stopped on the 8th of August and has not happened again, each time all CPUs complained and
then there's an ok message on the same second too:
root@cloudvirt1028:~# dmesg -T | grep 'Package temperature above' | cut -d] -f1 | sort | uniq | sort [Mon Aug 9 02:37:37 2021 [Mon Aug 9 02:43:37 2021 [Sat Aug 7 18:52:00 2021 [Sat Aug 7 18:57:58 2021 [Sat Aug 7 19:15:10 2021 [Sat Aug 7 19:21:33 2021 [Sat Aug 7 20:01:09 2021 [Sat Aug 7 20:26:29 2021 [Sat Aug 7 20:55:56 2021 [Sat Aug 7 21:20:30 2021 [Sat Aug 7 21:32:59 2021 [Sat Aug 7 21:58:12 2021 [Sat Aug 7 22:03:12 2021 [Sat Aug 7 22:12:26 2021 [Sat Aug 7 22:23:02 2021 [Sat Aug 7 22:31:46 2021 [Sat Aug 7 22:58:59 2021 [Sat Aug 7 23:04:11 2021 [Sat Aug 7 23:14:59 2021 [Sat Aug 7 23:30:30 2021 [Sat Aug 7 23:45:17 2021 [Sat Aug 7 23:50:29 2021 [Sun Aug 8 03:21:42 2021 [Sun Aug 8 05:42:01 2021 [Sun Aug 8 06:40:13 2021 [Sun Aug 8 07:04:24 2021 [Sun Aug 8 07:36:09 2021 [Sun Aug 8 07:42:47 2021 [Sun Aug 8 07:51:42 2021 [Sun Aug 8 07:57:30 2021 [Sun Aug 8 08:03:39 2021 [Sun Aug 8 08:12:17 2021 [Sun Aug 8 08:30:35 2021 [Sun Aug 8 08:38:50 2021 [Sun Aug 8 08:59:13 2021 [Sun Aug 8 09:04:35 2021 [Sun Aug 8 09:12:13 2021 [Sun Aug 8 09:29:10 2021 [Sun Aug 8 09:34:53 2021 [Sun Aug 8 09:40:42 2021 [Sun Aug 8 09:50:15 2021 [Sun Aug 8 09:55:16 2021 [Sun Aug 8 10:00:16 2021 [Sun Aug 8 10:12:21 2021 [Sun Aug 8 10:21:29 2021 [Sun Aug 8 10:44:51 2021 [Sun Aug 8 11:25:05 2021 [Sun Aug 8 12:04:14 2021 [Sun Aug 8 12:56:53 2021 [Sun Aug 8 13:15:09 2021 [Sun Aug 8 13:20:31 2021 [Sun Aug 8 13:32:23 2021 [Sun Aug 8 13:37:30 2021 [Sun Aug 8 13:56:31 2021 [Sun Aug 8 14:02:34 2021 [Sun Aug 8 14:23:33 2021 [Sun Aug 8 14:32:23 2021 [Sun Aug 8 14:39:36 2021 [Sun Aug 8 14:44:46 2021 [Sun Aug 8 14:54:17 2021 [Sun Aug 8 15:55:52 2021 [Sun Aug 8 16:01:01 2021 [Sun Aug 8 16:25:46 2021 [Sun Aug 8 16:31:09 2021 [Sun Aug 8 16:37:15 2021 [Sun Aug 8 16:43:19 2021 [Sun Aug 8 17:00:11 2021 [Sun Aug 8 17:06:26 2021 [Sun Aug 8 17:57:00 2021 [Sun Aug 8 18:02:20 2021 [Sun Aug 8 18:07:20 2021 [Sun Aug 8 18:19:12 2021 [Sun Aug 8 18:25:52 2021 [Sun Aug 8 18:37:15 2021 [Sun Aug 8 18:53:55 2021 [Sun Aug 8 19:00:07 2021 [Sun Aug 8 19:13:09 2021 [Sun Aug 8 19:19:20 2021 [Sun Aug 8 19:25:24 2021 [Sun Aug 8 19:31:59 2021 [Sun Aug 8 19:49:41 2021 [Sun Aug 8 19:54:41 2021 [Sun Aug 8 20:07:34 2021 [Sun Aug 8 20:32:30 2021 [Sun Aug 8 20:45:16 2021 [Sun Aug 8 21:01:03 2021 [Sun Aug 8 22:01:13 2021
Looking on other cloudvirts it seems to be a common issue, will gather some info.
This is happening on:
dcaro@cumin1001:~$ sudo cumin cloudvirt1* 'dmesg -T | grep "Package temperature above" | tail | cut -d] -f1 | sort | uniq | sort' 34 hosts will be targeted: cloudvirt[1012-1014,1016-1046].eqiad.wmnet Ok to proceed on 34 hosts? Enter the number of affected hosts to confirm or "q" to quit 34 ===== NODE GROUP ===== (1) cloudvirt1030.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep ...rt | uniq | sort' ----- [Tue Sep 28 11:40:19 2021 ===== NODE GROUP ===== (1) cloudvirt1026.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep ...rt | uniq | sort' ----- [Sun Sep 26 12:18:11 2021 ===== NODE GROUP ===== (1) cloudvirt1029.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep ...rt | uniq | sort' ----- [Mon Sep 27 20:54:27 2021 ===== NODE GROUP ===== (1) cloudvirt1028.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep ...rt | uniq | sort' ----- [Thu Sep 9 05:11:48 2021 ===== NODE GROUP ===== (1) cloudvirt1027.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep ...rt | uniq | sort' ----- [Tue Sep 28 12:29:19 2021 ===== NODE GROUP ===== (1) cloudvirt1025.eqiad.wmnet ----- OUTPUT of 'dmesg -T | grep ...rt | uniq | sort' ----- [Tue Sep 28 15:01:25 2021
That is row C8 (cloudvirt1025/1026/1027) and D5 (cloudvirt1029/1030)