Page MenuHomePhabricator

labstore1003 load spikes
Closed, DeclinedPublic


We have been paged 3 times for this and it resolves before I can see what's up.

At the moment I see the following serious users:

root@tools-bastion-03:~# host domain name pointer mwoffliner4.mwoffliner.eqiad.wmflabs.
root@tools-bastion-03:~# host domain name pointer maps-tiles3.maps.eqiad.wmflabs.
root@tools-bastion-03:~# host domain name pointer maps-wma1.maps.eqiad.wmflabs.

But none of them are doing anything abusive afaict.

Event Timeline

I jumped in on mwoffliner4.mwoffliner.eqiad.wmflabs and hot patched it to:

modules='act_mirr ifb'

Then ran (idempotent) /usr/local/sbin/tc-setup as that was the only instance I could see at the time thrashing IO. We didn't get paged again after this and the 3.

For the record, this paged again today 2018-05-04 (flapping)

Some graph data.

eth0 RX/TX bytes:

load avg:

I get an error trying to run nethogs:

# nethogs eth0
creating socket failed while establishing local IP - are you root?

It is now replaced by cloudstore1008/9 T187456