Page MenuHomePhabricator

Optimize networking configuration for WDQS
Closed, ResolvedPublic


While investigating T200563, it was found that wdqs[12]003 are dropping packets under load. All interrupts are handled by a single CPU, we should be able to spread that load over more CPUs.

Event Timeline

Gehel created this task.Oct 3 2018, 9:05 AM
Restricted Application added a project: Wikidata. · View Herald TranscriptOct 3 2018, 9:05 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel triaged this task as High priority.Oct 3 2018, 9:05 AM
Gehel added a subscriber: BBlack.Oct 4 2018, 12:52 PM

My current understanding of the issue:

All IRQs from NIC are handled by a single CPU. Under load, Blazegraph saturate this CPU (and others), this creates CPU contention with the NIC IRQ and leads to packet being dropped. Note that we also need a way to limit Blazegraph CPU consumption (T206108). Spreading those IRQ over multiple CPU should mitigate the contention.

Currently, all NIC related interrupt are handled by CPU0 (P7629).

NIC is currently configured with 1 TX and 4 RX queues, with a hardware max of 4 queues:

gehel@wdqs2003:~$ sudo ethtool -l eno1
Channel parameters for eno1:
Pre-set maximums:
RX:		4
TX:		4
Other:		0
Combined:	0
Current hardware settings:
RX:		4
TX:		1
Other:		0
Combined:	0

With IRQ 79 (en1-rx-1), it looks like affinity is configured to spread IRQ to all CPUs in NUMA node 0.

gehel@wdqs2003:/proc/irq/79$ cat smp_affinity
gehel@wdqs2003:/proc/irq/79$ cat smp_affinity_list 
gehel@wdqs2003:/proc/irq/79$ cat affinity_hint 

My understanding of the various documentations I see is that the smp_affinity above should be sufficient to spread the IRQs. This does not match what I'm seeing, so I'm probably missing something.

@BBlack: a review of the above and any pointer to the right direction would be welcomed!

Gehel claimed this task.Oct 4 2018, 12:52 PM
Addshore moved this task from incoming to monitoring on the Wikidata board.Oct 9 2018, 9:34 AM
Gehel added a comment.Oct 10 2018, 1:10 PM

With some trial an error, it looks like the smp_affinity = 00ff00ff would allow the IRQ to be managed by any CPU, but it is still managed by the first one (in this case, any == CPU0). Setting each IRQ on a specific CPU (and one only) will spread them. It looks like puppet interface::rps is setting this up, but it is also setting up a few other things. Time to read some more.

Change 465624 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: spread IRQ from NIC over multiple CPUs

Yes, let's look at this today. I think we need better tg3 ethernet card support in interface::rps for one of our authdnses anyways, which you'll need here too.

Change 467443 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] interface::rps: support tg3 properly

Change 467443 merged by BBlack:
[operations/puppet@production] interface::rps: support tg3 properly

Change 465624 merged by Gehel:
[operations/puppet@production] wdqs: spread IRQ from NIC over multiple CPUs

Mentioned in SAL (#wikimedia-operations) [2018-10-17T13:08:15Z] <gehel> applying rps NIC config for all wdqs nodes - T206105

Gehel added a comment.Oct 22 2018, 3:59 PM

Some minimal packet drop is still seen (< 100 packet / 24h), so the situation is very much better. More work needs to be done on limiting CPU usage on the blazegraph side.

Smalyshev closed this task as Resolved.Nov 6 2018, 6:23 PM