As early as November, we identified an issue with Varnish boxes and Linux's mm/kswapd behavior. This was previously detailed with this report at the end of November:
https://lists.wikimedia.org/mailman/private/ops/2013-November/026473.html
Since then, the testing of Linux 3.11 continued, but resulted in some unstable behavior (locked up boxes at random) and lack of time hasn't allowed us to continue pursuing this avenue.
In the meanwhile, per Domas' suggestion, we have deployed a cronjob that runs every minute and echos 1 > /proc/sys/vm/compact_memory. This has fixed some of the effects of the more immediate issues we were seeing (like the XFS "deadlock detected" issue).
Apparently, not all of the effects have been fixed by the cronjob, though. The attached graphs shows cp3012 doing the same "dropping large portions of pagecache" dance today, which resulted in a visible-to-users 503 spike.
We should explore the effect of what newer kernels will have, possibly 3.13 now, which is what trusty is getting released with and we will need to eventually move to anyway.
Description
Description
Details
Details
- Reference
- rt7268
Related Objects
Related Objects
- Mentioned In
- rOPUPf8f989dfd510: disable compact_memory on jessie T83809
- Mentioned Here
- T86648: Upgrade all HTTP frontends to Debian jessie
Event Timeline
Comment Actions
Change 187684 had a related patch set uploaded (by BBlack):
disable compact_memory on jessie T83809