Investigate ms-be hosts performance during rebalances
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	fgiunchedi
	Jan 7 2021, 12:13 PM

Description

During the latest eqiad swift / ms-be rebalances I've noticed the new hosts experience higher latency than the rest, especially around PUT/DELETE.

Some things off the top of my head that are worth investigating:

perf top shows native_queued_spin_lock_slowpath and that made me realize we're not load-balancing IRQs across CPUs, but we should (similar to cp / lvs hosts)
test rebalances with less weight (i.e. moving less partitions around)

Details

Subject	Repo	Branch	Lines +/-
interfaces: allow setting queues on i40e NICs	operations/puppet	production	+2 -2
swift: apply interface::rps to i40e NICs	operations/puppet	production	+1 -1
swift: decrease object replicator concurrency	operations/puppet	production	+3 -0
Update HDFS folder creation for analytics refinery	operations/puppet	production	+17 -6
swift: apply interface::rps to bnx2x as well	operations/puppet	production	+1 -1
role: add interface::rps to swift::storage	operations/puppet	production	+1 -4
role: add interface::rps to swift::storage	operations/puppet	production	+17 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		fgiunchedi	T266016 Refresh and expand Swift hardware capacity
		Resolved		fgiunchedi	T271415 Investigate ms-be hosts performance during rebalances

Event Timeline

fgiunchedi created this task.Jan 7 2021, 12:13 PM

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.Jan 7 2021, 1:50 PM

Change 655636 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] WIP: add interface::rps to swift::storage

https://gerrit.wikimedia.org/r/655636

gerritbot added a project: Patch-For-Review.Jan 12 2021, 11:26 AM

Change 655636 merged by Filippo Giunchedi:
[operations/puppet@production] role: add interface::rps to swift::storage

https://gerrit.wikimedia.org/r/655636

Maintenance_bot removed a project: Patch-For-Review.Jan 13 2021, 10:10 AM

Change 655902 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] role: add interface::rps to swift::storage

https://gerrit.wikimedia.org/r/655902

gerritbot added a project: Patch-For-Review.Jan 13 2021, 2:31 PM

Change 655902 merged by Filippo Giunchedi:
[operations/puppet@production] role: add interface::rps to swift::storage

https://gerrit.wikimedia.org/r/655902

Maintenance_bot removed a project: Patch-For-Review.Jan 14 2021, 10:10 AM

Change 656132 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: apply interface::rps to bnx2x as well

https://gerrit.wikimedia.org/r/656132

gerritbot added a project: Patch-For-Review.Jan 14 2021, 11:22 AM

Change 656132 merged by Filippo Giunchedi:
[operations/puppet@production] swift: apply interface::rps to bnx2x as well

https://gerrit.wikimedia.org/r/656132

The interface::rps define is active for broadcom NICs at the moment; I noticed that some HP hosts use the i40e driver instead. AFAICT we don't have interface::rps applied already to any hosts using i40e. I tried testing interface-rps.py on ms-be2056.

The script worked in the sense that there were no errors, although I'd like confirmation from e.g. @BBlack or @faidon perhaps. Namely that things look as they should on ms-be2056 and/or the script's logic needs adjusting for i40e NICs, thanks!

root@ms-be2056:/home/filippo# ./interface-rps.py eno5
/sys/class/net/eno5/queues/rx-0/rps_cpus = 1
/sys/class/net/eno5/queues/tx-0/xps_cpus = 1
/sys/class/net/eno5/queues/rx-1/rps_cpus = 2
/sys/class/net/eno5/queues/tx-1/xps_cpus = 2
/sys/class/net/eno5/queues/rx-2/rps_cpus = 4
/sys/class/net/eno5/queues/tx-2/xps_cpus = 4
/sys/class/net/eno5/queues/rx-3/rps_cpus = 8
/sys/class/net/eno5/queues/tx-3/xps_cpus = 8
/sys/class/net/eno5/queues/rx-4/rps_cpus = 10
/sys/class/net/eno5/queues/tx-4/xps_cpus = 10
/sys/class/net/eno5/queues/rx-5/rps_cpus = 20
/sys/class/net/eno5/queues/tx-5/xps_cpus = 20
/sys/class/net/eno5/queues/rx-6/rps_cpus = 40
/sys/class/net/eno5/queues/tx-6/xps_cpus = 40
/sys/class/net/eno5/queues/rx-7/rps_cpus = 80
/sys/class/net/eno5/queues/tx-7/xps_cpus = 80
/sys/class/net/eno5/queues/rx-8/rps_cpus = 100
/sys/class/net/eno5/queues/tx-8/xps_cpus = 100
/sys/class/net/eno5/queues/rx-9/rps_cpus = 200
/sys/class/net/eno5/queues/tx-9/xps_cpus = 200
/sys/class/net/eno5/queues/rx-10/rps_cpus = 1
/sys/class/net/eno5/queues/tx-10/xps_cpus = 1
/sys/class/net/eno5/queues/rx-11/rps_cpus = 2
/sys/class/net/eno5/queues/tx-11/xps_cpus = 2
/sys/class/net/eno5/queues/rx-12/rps_cpus = 4
/sys/class/net/eno5/queues/tx-12/xps_cpus = 4
/sys/class/net/eno5/queues/rx-13/rps_cpus = 8
/sys/class/net/eno5/queues/tx-13/xps_cpus = 8
/sys/class/net/eno5/queues/rx-14/rps_cpus = 10
/sys/class/net/eno5/queues/tx-14/xps_cpus = 10
/sys/class/net/eno5/queues/rx-15/rps_cpus = 20
/sys/class/net/eno5/queues/tx-15/xps_cpus = 20
/sys/class/net/eno5/queues/rx-16/rps_cpus = 40
/sys/class/net/eno5/queues/tx-16/xps_cpus = 40
/sys/class/net/eno5/queues/rx-17/rps_cpus = 80
/sys/class/net/eno5/queues/tx-17/xps_cpus = 80
/sys/class/net/eno5/queues/rx-18/rps_cpus = 100
/sys/class/net/eno5/queues/tx-18/xps_cpus = 100
/sys/class/net/eno5/queues/rx-19/rps_cpus = 200
/sys/class/net/eno5/queues/tx-19/xps_cpus = 200
/sys/class/net/eno5/queues/rx-20/rps_cpus = 1
/sys/class/net/eno5/queues/tx-20/xps_cpus = 1
/sys/class/net/eno5/queues/rx-21/rps_cpus = 2
/sys/class/net/eno5/queues/tx-21/xps_cpus = 2
/sys/class/net/eno5/queues/rx-22/rps_cpus = 4
/sys/class/net/eno5/queues/tx-22/xps_cpus = 4
/sys/class/net/eno5/queues/rx-23/rps_cpus = 8
/sys/class/net/eno5/queues/tx-23/xps_cpus = 8
/sys/class/net/eno5/queues/rx-24/rps_cpus = 10
/sys/class/net/eno5/queues/tx-24/xps_cpus = 10
/sys/class/net/eno5/queues/rx-25/rps_cpus = 20
/sys/class/net/eno5/queues/tx-25/xps_cpus = 20
/sys/class/net/eno5/queues/rx-26/rps_cpus = 40
/sys/class/net/eno5/queues/tx-26/xps_cpus = 40
/sys/class/net/eno5/queues/rx-27/rps_cpus = 80
/sys/class/net/eno5/queues/tx-27/xps_cpus = 80
/sys/class/net/eno5/queues/rx-28/rps_cpus = 100
/sys/class/net/eno5/queues/tx-28/xps_cpus = 100
/sys/class/net/eno5/queues/rx-29/rps_cpus = 200
/sys/class/net/eno5/queues/tx-29/xps_cpus = 200
/sys/class/net/eno5/queues/rx-30/rps_cpus = 1
/sys/class/net/eno5/queues/tx-30/xps_cpus = 1
/sys/class/net/eno5/queues/rx-31/rps_cpus = 2
/sys/class/net/eno5/queues/tx-31/xps_cpus = 2
/sys/class/net/eno5/queues/rx-32/rps_cpus = 4
/sys/class/net/eno5/queues/tx-32/xps_cpus = 4
/sys/class/net/eno5/queues/rx-33/rps_cpus = 8
/sys/class/net/eno5/queues/tx-33/xps_cpus = 8
/sys/class/net/eno5/queues/rx-34/rps_cpus = 10
/sys/class/net/eno5/queues/tx-34/xps_cpus = 10
/sys/class/net/eno5/queues/rx-35/rps_cpus = 20
/sys/class/net/eno5/queues/tx-35/xps_cpus = 20
/sys/class/net/eno5/queues/rx-36/rps_cpus = 40
/sys/class/net/eno5/queues/tx-36/xps_cpus = 40
/sys/class/net/eno5/queues/rx-37/rps_cpus = 80
/sys/class/net/eno5/queues/tx-37/xps_cpus = 80
/sys/class/net/eno5/queues/rx-38/rps_cpus = 100
/sys/class/net/eno5/queues/tx-38/xps_cpus = 100
/sys/class/net/eno5/queues/rx-39/rps_cpus = 200
/sys/class/net/eno5/queues/tx-39/xps_cpus = 200

Change 656837 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: decrease object replicator concurrency

https://gerrit.wikimedia.org/r/656837

Change 657372 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update HDFS folder creation for analytics refinery

https://gerrit.wikimedia.org/r/657372

Change 657372 merged by Elukey:
[operations/puppet@production] Update HDFS folder creation for analytics refinery

https://gerrit.wikimedia.org/r/657372

Change 656837 merged by Filippo Giunchedi:
[operations/puppet@production] swift: decrease object replicator concurrency

https://gerrit.wikimedia.org/r/656837

In T271415#6747364, @fgiunchedi wrote:

The interface::rps define is active for broadcom NICs at the moment; I noticed that some HP hosts use the i40e driver instead. AFAICT we don't have interface::rps applied already to any hosts using i40e. I tried testing interface-rps.py on ms-be2056.

The script worked in the sense that there were no errors, although I'd like confirmation from e.g. @BBlack or @faidon perhaps. Namely that things look as they should on ms-be2056 and/or the script's logic needs adjusting for i40e NICs, thanks!

Catching up on this from my backlog: The basics all seem to miraculously work well enough by default for this case. The NUMA filtering works, the tx/rx queue mapping works, the IRQ counts indicate that it's doing what it claims to do, etc. The only minor issue is that the i40e driver is, by default, configuring 40 queues to match the 40 CPUs it counts on the host, while interface-rps knows that there are only 10 real (as opposed to Hyperthread sibling) CPUs attached to the closest NUMA domain to the card. interface-rps.py deals with this pretty ok and just maps 4 queues to each of its real target CPUs, but it would probably be more-ideal to just trim the card down to 10 queues and let it all map 1:1.

interface::rps already has support for doing this, which looks like it would also work fine as-is on this card, at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/interface/manifests/rps.pp#45 . However, because we were afraid of fallout on $random_cards, we have an if-guard there that only lets it run on the Broadcom drivers that we know well. If you add a match for i40e to that regex, it should work. Keep in mind the comments there, that deploying this ethtool Exec to a host for the first time will likely blip the link status of the interface while it's reconfiguring the queues and cause a tiny outage.

fgiunchedi mentioned this in T273453: ms-fe.svc.codfw.wmnet paged during Swift rebalance.Feb 1 2021, 1:27 PM

Change 661053 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] interfaces: allow setting queues on i40e NICs

https://gerrit.wikimedia.org/r/661053

Change 661054 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: apply interface::rps to i40e NICs

https://gerrit.wikimedia.org/r/661054

In T271415#6791363, @BBlack wrote:

In T271415#6747364, @fgiunchedi wrote:

The interface::rps define is active for broadcom NICs at the moment; I noticed that some HP hosts use the i40e driver instead. AFAICT we don't have interface::rps applied already to any hosts using i40e. I tried testing interface-rps.py on ms-be2056.

The script worked in the sense that there were no errors, although I'd like confirmation from e.g. @BBlack or @faidon perhaps. Namely that things look as they should on ms-be2056 and/or the script's logic needs adjusting for i40e NICs, thanks!

Catching up on this from my backlog: The basics all seem to miraculously work well enough by default for this case. The NUMA filtering works, the tx/rx queue mapping works, the IRQ counts indicate that it's doing what it claims to do, etc. The only minor issue is that the i40e driver is, by default, configuring 40 queues to match the 40 CPUs it counts on the host, while interface-rps knows that there are only 10 real (as opposed to Hyperthread sibling) CPUs attached to the closest NUMA domain to the card. interface-rps.py deals with this pretty ok and just maps 4 queues to each of its real target CPUs, but it would probably be more-ideal to just trim the card down to 10 queues and let it all map 1:1.

interface::rps already has support for doing this, which looks like it would also work fine as-is on this card, at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/interface/manifests/rps.pp#45 . However, because we were afraid of fallout on $random_cards, we have an if-guard there that only lets it run on the Broadcom drivers that we know well. If you add a match for i40e to that regex, it should work. Keep in mind the comments there, that deploying this ethtool Exec to a host for the first time will likely blip the link status of the interface while it's reconfiguring the queues and cause a tiny outage.

Thank you for checking things out! I'm glad interface-rps.py DTRT by default in the i40e case as well. I've added i40e to the list here for review: https://gerrit.wikimedia.org/r/c/operations/puppet/+/661053 and then indeed ethtool's exec gets run on https://puppet-compiler.wmflabs.org/compiler1003/27800/ms-be2056.codfw.wmnet/index.html when used from swift::performance (https://gerrit.wikimedia.org/r/c/operations/puppet/+/661054)

Change 661054 merged by Filippo Giunchedi:
[operations/puppet@production] swift: apply interface::rps to i40e NICs

https://gerrit.wikimedia.org/r/661054

Change 661053 abandoned by Filippo Giunchedi:
[operations/puppet@production] interfaces: allow setting queues on i40e NICs

Reason:
Superseded by https://gerrit.wikimedia.org/r/c/operations/puppet/ /662688/5/modules/interface/files/interface-rps.py#162

https://gerrit.wikimedia.org/r/661053

We're balancing IRQs amongst CPUs now, and will be setting nic queues for i40e NICs with https://gerrit.wikimedia.org/r/c/operations/puppet/+/662688/5/modules/interface/files/interface-rps.py#162

Investigate ms-be hosts performance during rebalancesClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Investigate ms-be hosts performance during rebalances
Closed, ResolvedPublic
Actions

Related Objects
Search...