After T58602 was solved again the memcache bandwidth usage started dropping again but now it is even worse.
Description
Description
Details
Details
- Reference
- bz72024
Related Objects
Related Objects
- Mentioned Here
- T58602: avoid fetching SiteList object from memcached
Event Timeline
Comment Actions
If we check the result of memkeys (on eth0) to see the top keys on all 18 mc10** servers, we could check for:
a) The sites key to see if that problem is back
b) Any other key that has excessive or unexpected usage
Comment Actions
Captured about 60 seconds worth of requests to mc1001 via tcpdump:
sudo tcpdump -i eth0 -s 500 -A -t port 11211 | cut -c 9- | grep gets > tmp.txt # Separate WANCache since it typically has one more colon-separated segment before dnname/global cat tmp.txt | grep WANCache > tmp.wan.txt cat tmp.txt | grep -v WANCache > tmp.nonwan.txt # Strip ids and hashes cat tmp.wan.txt | sed 's/:[0-9a-f][0-9a-f][0-9a-f]\+/:*/g' > tmp.wan.norm.txt cat tmp.nonwan.txt | sed 's/:[0-9a-f][0-9a-f][0-9a-f]\+/:*/g' > tmp.nonwan.norm.txt
EDIT: See next comment
Comment Actions
Aggregated from mc10* (mc1001-mc1018) during approx. 60 seconds.
Popular WANCache keys
$ cat *.wan.norm.txt | cut -d':' -f4- | sort | uniq -c | sort -rn | head 2407183 revisiontext:textid:* 1391617 file:* 1090343 page:10:* 782408 page:content-model:* 687809 revision:enwiki:*:* 632893 revision:enwiktionary:*:* 521202 image_redirect:* 462218 revision:commonswiki:*:* 426820 page-restrictions:*:* 401333 gadgets-definition:9:2 248394 titleblacklist:normalized-unicode:* 222898 messages:en 216919 messages:en:hash:v1 150800 gadgets-definition:* 144216 Wikimedia\Rdbms\LoadBalancer:server-read-only:* 126767 revision:zhwiki:*:* 94437 user:id:enwiki:*
Popular local cluster keys
$ cat *.nonwan.norm.txt | cut -d':' -f2- | sort | uniq -c | sort -rn | head 504279 Wikimedia\Rdbms\ChronologyProtector:*:v1 310501 preprocess-hash:*:1 130739 preprocess-hash:*:0 113401 textextracts:*:*:en:1:1 97314 pcache:idoptions:* 45978 page:last-dc-purge:* 22789 CacheAwarePropertyInfoStore 13967 flaggedrevs:includesSynced:* 11550 textextracts:*:*:es:1:1 11261 textextracts:*:*:en:1: 8685 OtherProjectsSites:* 4602 textextracts:*:*:de:1:1
- Total number of gets captured: 13,929,151 lines (13 million)
- WANCache: 12,397,808 (89%)
- Local cluster: 1,531,343 (11%)
(Unrelated: During the capture, 0 mc gets were received by mc1005 - not pooled?).