Page MenuHomePhabricator should have a flag to avoid CPU0
Open, NormalPublic


As discussed during a recent incident, it's probably a net win to have the RSS/RPS/XPS stuff in avoid CPU0, since so many other non-network things and overhead will land on CPU0 by default. Someone should add a new config setting for this so we can try it out experimentally! Probably the best place to do the filtering would be right at the end of get_cpu_list (after numa and HT sbling filters are done and the list is uniq'd and sorted) and just remove the literal cpu 0 from the front of the list if it's there.


Related Gerrit Patches:

Event Timeline

BBlack triaged this task as Normal priority.Oct 22 2019, 8:09 PM
BBlack created this task.
Restricted Application added a project: Operations. · View Herald TranscriptOct 22 2019, 8:09 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
CDanis added a subscriber: CDanis.Oct 22 2019, 8:15 PM
ema moved this task from Triage to General on the Traffic board.Mon, Oct 28, 3:57 PM

Change 551907 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/puppet@production] interface: Add a config option to avoid CPU 0 in RPS.

So the patch above adds it to the queue distribution logic in interface-rps, but there's another piece of the puzzle here, which is setting the hardware's queue count for the interface itself, which is where a little bit of a rabbithole develops...

The first interfaces we had that even had the capability of large and configurable counts were the bnx2x devices, and originally we could only configure the count via kernel module parameter. This is why modules/interface/manifests/rps/modparams.pp exists, which sets up such a parameter (and then requires a reboot after the puppet run to really get things going).

Later, ethtool worked for settings bnx2x counts as well (with the -L parameter), and we added that near the bottom of modules/interface/manifests/rps.pp in exec { "ethtool_rss_combined_channels_${interface}": [...] (which is guarded with a read-check first, because executing a change here blips the interface link state). We kept the modparam as well so that hopefully it's set correctly on boot and the first run of the script doesn't even need to blip the interface. Our newest nodes and cards are now all bnxt_en, which is a slightly different driver. This doesn't even have a modparam, so we're relying on the ethtool execution exclusively for these (increasingly common for traffic nodes) cases.

Both of these puppet-driven mechanisms decide on the count of queues to configure by looking at the facter data under the numa key (which we pretty much created for this and related purposes), but it takes the simple approach of just sizing the array of ht-sets, so it would continue to configure an extra queue for CPU0 in the common case if we turned on the new interface-rps flag but didn't fix this side of things as well....

Probably the simplest way to address this would be to add avoid_cpu0 logic to both manifests referenced above, so that their $num_queues takes this into account (by filtering out the htset which contains cpu0... the data looks like: "device_to_htset"=>{"eno1"=>[[0, 24], [2, 26], [4, 28], [6, 30], ...], ...}. I could also see an argument, though, for just getting rid of the modparam variant (and cleaning up its remains, especially on existing bnx2x hosts), and folding the ethtool variant into It's a little more work, but I think it leaves us in a cleaner state at the end. interface-rps already sorts out the queue count correctly, it would just need to check-and-set the ethtool param early on (right after get_cpu_list, only for certain drivers, and before get_queues, which is affected by it.... ).

Adding @RLazarus in hopes of nerd-sniping him further on this topic...

Change 551907 merged by RLazarus:
[operations/puppet@production] interface: Add a config option to avoid CPU 0 in RPS.