Page MenuHomePhabricator

codfw/eqiad hosts occasionally spend > 3 minutes starting networking.service with linux 4.9
Closed, ResolvedPublic

Description

We've observed issues on two codfw cache hosts (cp2006 and cp2015) while bringing up the network interface at startup. Starting networking.service takes a little more than 3 minutes, but eventually succeeds. The unit can be seen on top of systemd-analyze blame:

3min 1.271s networking.service

The problem might be related to the intel_uncore kernel module, in both cases the following output could be found in dmesg:

[Mon Apr 10 15:14:17 2017] INFO: task systemd-udevd:677 blocked for more than 120 seconds.
[Mon Apr 10 15:14:17 2017]       Tainted: G            E   4.9.0-0.bpo.2-amd64 #1
[Mon Apr 10 15:14:17 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Apr 10 15:14:17 2017] systemd-udevd   D    0   677    665 0x00000004
[Mon Apr 10 15:14:17 2017]  ffff9d19b153fc00 0000000000000000 ffff9d39b2a94000 ffff9d19ae362080
[Mon Apr 10 15:14:17 2017]  ffff9d39bf158700 ffffb9179a677b50 ffffffffbc7f668d 0000000000000001
[Mon Apr 10 15:14:17 2017]  0000000000000001 00000000024080c0 00000000024080c0 ffff9d19ae362080
[Mon Apr 10 15:14:17 2017] Call Trace:
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc7f668d>] ? __schedule+0x23d/0x6d0
[Mon Apr 10 15:14:17 2017]  [<ffffffffc08bee30>] ? uncore_pci_remove+0x160/0x160 [intel_uncore]
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc7f6b52>] ? schedule+0x32/0x80
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc7fa089>] ? schedule_timeout+0x249/0x300
[Mon Apr 10 15:14:17 2017]  [<ffffffffc08be554>] ? uncore_alloc_box+0x74/0xd0 [intel_uncore]
[Mon Apr 10 15:14:17 2017]  [<ffffffffc08be64f>] ? uncore_cpu_prepare+0x9f/0x100 [intel_uncore]
[Mon Apr 10 15:14:17 2017]  [<ffffffffc08bee30>] ? uncore_pci_remove+0x160/0x160 [intel_uncore]
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc7f756a>] ? wait_for_completion+0xfa/0x130
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc2a2b30>] ? wake_up_q+0x60/0x60
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc2791d6>] ? cpuhp_issue_call+0x96/0xc0
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc27948a>] ? __cpuhp_setup_state+0xca/0x200
[Mon Apr 10 15:14:17 2017]  [<ffffffffc08dd317>] ? intel_uncore_init+0x1c1/0xeaa [intel_uncore]
[Mon Apr 10 15:14:17 2017]  [<ffffffffc08dd156>] ? uncore_type_init+0x156/0x156 [intel_uncore]
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc20218c>] ? do_one_initcall+0x4c/0x180
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc37c3af>] ? do_init_module+0x5a/0x1f1
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc301de9>] ? load_module+0x23c9/0x28f0
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc2fe650>] ? __symbol_put+0x60/0x60
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc4033a4>] ? vfs_read+0x114/0x130
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc4a3e01>] ? security_capable+0x41/0x60
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc3024fe>] ? SYSC_finit_module+0x8e/0xe0
[Mon Apr 10 15:14:17 2017]  [<ffffffffbc7fb3fb>] ? system_call_fast_compare_end+0xc/0x9b

Event Timeline

These are tracked under the perf section (Performance monitoring) in Kconfig: "Include support for Intel uncore performance events. These are available on NehalemEX and more modern processors".

So we can probably just blacklist these and see whether that makes a difference.

JFTR, this code was only made modular in between Linux 4.4 and 4.9; in Linux 4.4 the code is built-in.

ema renamed this task from codfw hosts occasionally spend > 3 minutes starting networking.service with linux 4.9 to codfw/eqiad hosts occasionally spend > 3 minutes starting networking.service with linux 4.9.Apr 12 2017, 1:43 PM

Change 347848 had a related patch set uploaded (by Ema):
[operations/puppet@production] Blacklist intel_uncore kernel module

https://gerrit.wikimedia.org/r/347848

Change 347848 merged by Ema:
[operations/puppet@production] Blacklist intel_uncore kernel module

https://gerrit.wikimedia.org/r/347848

ema claimed this task.

Blacklisting intel_uncore fixed the problem.

Reopening, I've just seen this happening again on cp1066. This is what systemd-analyze blame reported after a slow but successful boot:

3min 2.726s networking.service

This time intel_cstate was featured in dmesg's output:

[Thu Jul 20 11:38:58 2017] INFO: task systemd-udevd:516 blocked for more than 120 seconds.
[Thu Jul 20 11:38:58 2017]       Not tainted 4.9.0-0.bpo.3-amd64 #1
[Thu Jul 20 11:38:58 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jul 20 11:38:58 2017] systemd-udevd   D    0   516    500 0x00000004
[Thu Jul 20 11:38:58 2017]  ffff9d8f0e223800 0000000000000000 ffff9da712a66100 ffff9d8f0f8cc100
[Thu Jul 20 11:38:58 2017]  ffff9da71f4587c0 ffffb31bd9e5bb70 ffffffffb8a0089d 0000000000000000
[Thu Jul 20 11:38:58 2017]  0000000000000000 00000000b872b7ef 0000000000000001 ffff9d8f0f8cc100
[Thu Jul 20 11:38:58 2017] Call Trace:
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8a0089d>] ? __schedule+0x23d/0x6d0
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8a00d62>] ? schedule+0x32/0x80
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8a04299>] ? schedule_timeout+0x249/0x300
[Thu Jul 20 11:38:58 2017]  [<ffffffffb842ecc5>] ? sched_clock+0x5/0x10
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8a0177a>] ? wait_for_completion+0xfa/0x130
[Thu Jul 20 11:38:58 2017]  [<ffffffffb84a2b80>] ? wake_up_q+0x60/0x60
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8479256>] ? cpuhp_issue_call+0x96/0xc0
[Thu Jul 20 11:38:58 2017]  [<ffffffffb847950a>] ? __cpuhp_setup_state+0xca/0x200
[Thu Jul 20 11:38:58 2017]  [<ffffffffc05580ac>] ? cstate_probe_msr+0xac/0xac [intel_cstate]
[Thu Jul 20 11:38:58 2017]  [<ffffffffc0558173>] ? cstate_pmu_init+0xc7/0xf54 [intel_cstate]
[Thu Jul 20 11:38:58 2017]  [<ffffffffb840218c>] ? do_one_initcall+0x4c/0x180
[Thu Jul 20 11:38:58 2017]  [<ffffffffb857c4cf>] ? do_init_module+0x5a/0x1f1
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8501bff>] ? load_module+0x213f/0x28f0
[Thu Jul 20 11:38:58 2017]  [<ffffffffb84fe6f0>] ? __symbol_put+0x60/0x60
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8603454>] ? vfs_read+0x114/0x130
[Thu Jul 20 11:38:58 2017]  [<ffffffffb86a42e1>] ? security_capable+0x41/0x60
[Thu Jul 20 11:38:58 2017]  [<ffffffffb850259e>] ? SYSC_finit_module+0x8e/0xe0
[Thu Jul 20 11:38:58 2017]  [<ffffffffb8a055fb>] ? system_call_fast_compare_end+0xc/0x9b

So Kconfig says that PERF_EVENTS_INTEL_CSTATE is about perf events for power monitoring. As I don't think we need it, we could blacklist it together with intel-rapl-perf (PERF_EVENTS_INTEL_RAPL) while we're at it.

Change 366548 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] base::kernel: blacklist intel_cstate and intel_rapl_perf

https://gerrit.wikimedia.org/r/366548

Change 366548 merged by Ema:
[operations/puppet@production] base::kernel: blacklist intel_cstate and intel_rapl_perf

https://gerrit.wikimedia.org/r/366548

@ema: All related patches in Gerrit have been merged. Can this task be resolved (via Add Action...Change Status in the dropdown menu), or is there more to do in this task? Asking as you are set as task assignee. Thanks in advance!

@ema: All related patches in Gerrit have been merged. Can this task be resolved

Yup, we haven't seen this happening anymore.