Page MenuHomePhabricator

Hunt down duplicate memcached gets
Closed, DeclinedPublic

Description

Chase down all code-paths that result in multiple fetches of some memcached key in the context of a single request without good reason.

Event Timeline

ori created this task.Mar 2 2016, 12:58 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 2 2016, 12:58 AM
ori added a comment.EditedMar 2 2016, 1:15 AM

I whipped up this script to catch dupes:

1#!/usr/bin/python -u
2# -*- coding: utf-8 -*-
3"""
4 memc-dupes
5 ~~~~~~~~~~
6 Snoop outbound memcached traffic for dupe gets (that is, keys that are
7 retrieved multiple times in rapid succession).
8
9 Usage:
10
11 memc-dupes [--window-size SIZE] [--dupe-threshold THRESHOLD]
12
13 --window-size SIZE look for dupe gets in a sliding window
14 of this many keys (default: 100).
15
16 --dupe-threshold THRESHOLD print keys that repeat this many times
17 or more (default: 5).
18
19
20 Copyright 2016 Ori Livneh <ori@wikimedia.org>
21
22 Licensed under the Apache License, Version 2.0 (the "License");
23 you may not use this file except in compliance with the License.
24 You may obtain a copy of the License at
25
26 http://www.apache.org/licenses/LICENSE-2.0
27
28 Unless required by applicable law or agreed to in writing, software
29 distributed under the License is distributed on an "AS IS" BASIS,
30 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
31 See the License for the specific language governing permissions and
32 limitations under the License.
33
34"""
35import sys
36reload(sys)
37sys.setdefaultencoding('utf-8')
38
39import argparse
40import collections
41import os
42import re
43import subprocess
44
45
46def iter_keys():
47 """Spawns tcpdump and yields memcached keys requested by this host."""
48 with open(os.devnull, 'w') as devnull:
49 cmd = ('/usr/bin/sudo', '/usr/sbin/tcpdump', '-qAt', 'port', '11211')
50 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=devnull)
51 for line in iter(proc.stdout.readline, b''):
52 for key in re.findall(r'gets?( \S+)+', line):
53 yield key.strip()
54
55
56ap = argparse.ArgumentParser(
57 description='snoop memcached traffic for dupe gets')
58ap.add_argument('--window-size',help='look for dupe gets in a sliding window'
59 ' of this many keys (default: 1000)', default=100, type=int)
60ap.add_argument('--dupe-threshold', help='print keys that repeat this many'
61 ' times or more (default: 5)', default=5, type=int)
62args = ap.parse_args()
63
64recent_keys = collections.deque(maxlen=args.window_size)
65seen_counts = collections.defaultdict(int)
66
67print('Looking for keys that repeat %(dupe_threshold)s+ times '
68 'in sliding window of %(window_size)s keys:' % vars(args))
69
70for key in iter_keys():
71 recent_keys.append(key)
72 seen_counts[key] += 1
73 for key, times_seen in seen_counts.items():
74 if key not in recent_keys:
75 del seen_counts[key]
76 if times_seen > args.dupe_threshold:
77 print('%2s times: %s' % (times_seen, key))

Some egregious offenders, captured on mw1076:

59 times: WANCache:v:fawiki:file-djvu:dimensions:kwzejl133upgwmfx7xzaxo8makcjosi
59 times: WANCache:v:commonswiki:file-djvu:dimensions:nces1uof6ntvew3nsrr9timxjcw7bcf
57 times: WANCache:v:enwiki:revisiontext:textid:674199246
51 times: WANCache:v:enwiki:SiteStats:groupcounts:sysop
36 times: WANCache:t:eswiki:gadgets-definition:9:2
32 times: WANCache:t:commonswiki:gadgets-definition:9:2
26 times: WANCache:v:enwiki:revisiontext:textid:664533748
25 times: commonswiki:pagetranslation:sourcepages
25 times: WANCache:v:enwiki:revisiontext:textid:657411725
25 times: WANCache:v:enwiki:revisiontext:textid:607643685
24 times: WANCache:v:enwiki:revisiontext:textid:622844668
24 times: WANCache:v:enwiki:revisiontext:textid:611589557
19 times: WANCache:v:jawiki:revisiontext:textid:48538892
19 times: WANCache:v:jawiki:revisiontext:textid:42542050
17 times: WANCache:v:itwikisource:file-djvu:dimensions:q2lvtjhv5om2d9lnawvfbwqpkrbrhx1
17 times: WANCache:v:frwikisource:file-djvu:dimensions:0teduf9evyos2mqtghzco1v63oqxhj9
16 times: WANCache:t:enwiki:gadgets-definition:9:2
Krinkle added a subscriber: Krinkle.EditedMar 2 2016, 1:38 AM

I would strongly urge us to stay away from fix-it-all solutions that just process-cache entire BagOStuffs. Code really shouldn't be spreading its lookups in the first place. Such caching complicates things a lot. I've seen this happen in other applications and it never ends well.

On the other hand, re-using values as actual data via descriptive methods that lazy-load value in-object is sweet. And static caches for entire objects is great too (such as for Title).

I'd rather we invest a little extra discipline in centralising lookup responsibility where not the case currently and add more functional caching. (lazy-load data and cache entire class objects)

Agabi10 added a subscriber: Agabi10.Mar 2 2016, 2:04 AM
aaron triaged this task as Medium priority.Mar 14 2016, 9:30 PM
aaron moved this task from Inbox to Backlog: Small & Maintenance on the Performance-Team board.
Gilles closed this task as Declined.Dec 7 2016, 7:26 PM
Gilles added a subscriber: Gilles.

There are Mediawiki warnings in place now for that situation.