We have beeen getting port saturation threshold alerts every few minutes for virt1002 & labstore1001. Although the latter is being (slowly) being worked on with #7282, I'm not aware of any work happening regarding virt1002.
Even if we upgrade both, there's nothing suggesting that the capacity will be enough, as I'm not aware of any data pinpointing where this capacity is being used.
This is likely a larger Labs issue that needs to be taken care of. It's being going on for many weeks now and is probably largely degrading Labs' performance to the point that makes me wonder why we're not treating it with a very high priority, possibly even as a Labs outage.
Description
Details
- Reference
- rt7657
Event Timeline
On Tue Jun 10 04:38:13 2014, faidon wrote:
We have beeen getting port saturation threshold alerts every few
minutes for virt1002 & labstore1001. Although the latter is being
(slowly) being worked on with #7282, I'm not aware of any work
happening regarding virt1002.Even if we upgrade both, there's nothing suggesting that the capacity
will be enough, as I'm not aware of any data pinpointing where this
capacity is being used.This is likely a larger Labs issue that needs to be taken care of.
It's being going on for many weeks now and is probably largely
degrading Labs' performance to the point that makes me wonder why
we're not treating it with a very high priority, possibly even as a
Labs outage.
Right now, virt1002 has most (almost all) of the labs VMs on it, so it ends up
being a sore hotspot. I'm going to look into spreading the load between more
servers in the short term, and applying tc rules to cap the amount of bandwidth
used by Labs.
While you were away, I tracked this to certain CVN VMs. I prodded Krinkle (their maintainer) a couple of times and he promptly moved the file accesses to local storage, which very quickly dropped the bandwidth to saner levels again.
So, this might also be helped by user education -- Krinkle didn't know there was a penalty incurred for access to /data :)