Page MenuHomePhabricator

LVS interface settings from /e/n/i not consistently applied on first boots
Open, MediumPublic0 Estimated Story Points

Description

On the initial boot (or two or three?) of some newly-provisioned 4-port LVS servers in core DCs, sometimes all of the customized interface settings that are puppetized into /etc/network/interfaces are not consistently applied to at least the primary interface. The block of /e/n/i commands we're talking about are e.g. these:

offload-lro off
up /usr/local/sbin/interface-rps enp4s0f0
up echo 10000 > /sys/class/net/enp4s0f0/tx_queue_len
hardware-dma-ring-rx 4078
up ethtool -A enp4s0f0 autoneg off tx off rx off ||:

For the three non-primary interfaces, this stuff is set from an iface foo inet manual block near the end of the file, whereas for the primary they're part of the initial iface foo inet static block along with the primary IP, gateway, etc settings. What was observed was that on initial reboot (after reimaging did its post-puppet reboot), some or all of these settings were not in effect for the primary interface, but were for the other 3. This can be confirmed by e.g. looking at the output of /proc/interrupts for the proper IRQ count stagger on the interface across CPU cores, or querying the autoneg and RX ring settings from ethtool, or looking at tx_queue_len in sysfs.

Normally during bootup, we see at least the output of interface-rps attributed to sh in the syslogs from running up commands for each interface, like:

May 20 14:56:52 lvs1013 sh[757]: /proc/irq/70/smp_affinity = 1
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-0/rps_cpus = 1
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-0/xps_cpus = 1
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-8/xps_cpus = 1
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-16/xps_cpus = 1
May 20 14:56:52 lvs1013 sh[757]: /proc/irq/71/smp_affinity = 4
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-1/rps_cpus = 4
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-1/xps_cpus = 4
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-9/xps_cpus = 4
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-17/xps_cpus = 4
May 20 14:56:52 lvs1013 sh[757]: /proc/irq/72/smp_affinity = 10
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-2/rps_cpus = 10
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-2/xps_cpus = 10
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-10/xps_cpus = 10
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-18/xps_cpus = 10
May 20 14:56:52 lvs1013 sh[757]: /proc/irq/73/smp_affinity = 40
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-3/rps_cpus = 40
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-3/xps_cpus = 40
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-11/xps_cpus = 40
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-19/xps_cpus = 40
May 20 14:56:52 lvs1013 sh[757]: /proc/irq/74/smp_affinity = 100
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-4/rps_cpus = 100
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-4/xps_cpus = 100
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-12/xps_cpus = 100
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-20/xps_cpus = 100
May 20 14:56:52 lvs1013 sh[757]: /proc/irq/75/smp_affinity = 400
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-5/rps_cpus = 400
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-5/xps_cpus = 400
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-13/xps_cpus = 400
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-21/xps_cpus = 400
May 20 14:56:52 lvs1013 sh[757]: /proc/irq/76/smp_affinity = 1000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-6/rps_cpus = 1000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-6/xps_cpus = 1000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-14/xps_cpus = 1000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-22/xps_cpus = 1000
May 20 14:56:52 lvs1013 sh[757]: /proc/irq/77/smp_affinity = 4000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/rx-7/rps_cpus = 4000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-7/xps_cpus = 4000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-15/xps_cpus = 4000
May 20 14:56:52 lvs1013 sh[757]: /sys/class/net/enp5s0f1/queues/tx-23/xps_cpus = 4000

But in the cases where the primary didn't get all of its settings on bootup, we see this in place of the expected output for that one interface:

May 20 14:57:00 lvs1013 sh[736]: RTNETLINK answers: File exists
May 20 14:57:00 lvs1013 sh[736]: ifup: failed to bring up enp4s0f0

After multiple reboots, all of the new LVSes in question seem to have eventually settled into a pattern of always coming up correctly. But note that lvs1016, which was pressed into service much earlier than the rest, continues to have untweaked interface settings on its primary interface, probably for lack of sufficient reboots so far. We'll get to test this once we're done re-arranging the new eqiad LVSes and it's an easily-rebootable secondary.

My primary working theory is that some of the up-commands here make the interface unavailable to the be modified by the other commands for a short while after they've completed (command exited). For example, the command which sets the DMA ring size or the LRO stuff may exit quickly, but the interface is unavailable for a brief while afterwards, causing the interface-rps or the autoneg command to fail. If any of them fail, ifup doesn't execute any of the remaining ones. Some of these commands cause their settings to persist (at least soft) reboots, making future executions a quick no-op, which would explain why things eventually start working consistently after a few reboots (eventually all of the potentially-stalling commands are no-ops). I'm still not exactly sure why it seems to only exhibit on the primary interface so far.

Anyways, something to keep an eye on in future provisioning and reboots for now until we understand it better....

Event Timeline

BBlack triaged this task as Medium priority.May 21 2019, 2:29 PM
BBlack created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2019-05-21T17:59:18Z] <bblack> rebooting lvs1016 in attempt to clear interface config issues - T224027

FWIW, lvs1016 came back with correct settings after the single additional reboot above.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!