My current understanding of the issue:
All IRQs from NIC are handled by a single CPU. Under load, Blazegraph saturate this CPU (and others), this creates CPU contention with the NIC IRQ and leads to packet being dropped. Note that we also need a way to limit Blazegraph CPU consumption (T206108). Spreading those IRQ over multiple CPU should mitigate the contention.
Currently, all NIC related interrupt are handled by CPU0 (P7629).
NIC is currently configured with 1 TX and 4 RX queues, with a hardware max of 4 queues:
gehel@wdqs2003:~$ sudo ethtool -l eno1 Channel parameters for eno1: Pre-set maximums: RX: 4 TX: 4 Other: 0 Combined: 0 Current hardware settings: RX: 4 TX: 1 Other: 0 Combined: 0
With IRQ 79 (en1-rx-1), it looks like affinity is configured to spread IRQ to all CPUs in NUMA node 0.
gehel@wdqs2003:/proc/irq/79$ cat smp_affinity 00ff00ff gehel@wdqs2003:/proc/irq/79$ cat smp_affinity_list 0-7,16-23 gehel@wdqs2003:/proc/irq/79$ cat affinity_hint 00000000
My understanding of the various documentations I see is that the smp_affinity above should be sufficient to spread the IRQs. This does not match what I'm seeing, so I'm probably missing something.
@BBlack: a review of the above and any pointer to the right direction would be welcomed!
With some trial an error, it looks like the smp_affinity = 00ff00ff would allow the IRQ to be managed by any CPU, but it is still managed by the first one (in this case, any == CPU0). Setting each IRQ on a specific CPU (and one only) will spread them. It looks like puppet interface::rps is setting this up, but it is also setting up a few other things. Time to read some more.