Page MenuHomePhabricator

Investigate dynomite for WANObjectCache support
Closed, ResolvedPublic

Description

See https://github.com/Netflix/dynomite/ .

This may work as a simpler (to compile/configure) alternative to mcrouter. I need to see what kind of multi-DC support it has. The support mcrouter has is just best-effort sync operations (logged on failure per host on a file nothing uses itself) for all operations of a certain type. That should not be hard to match. DC prefix (as long as is doesn't show up in keys) routing or similar features in other systems could be supported by WAN cache if needed without much effort.

Event Timeline

aaron created this task.Feb 1 2017, 5:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 1 2017, 5:57 PM
aaron triaged this task as Normal priority.Feb 1 2017, 8:00 PM
aaron moved this task from Inbox to Doing on the Performance-Team board.
elukey added a subscriber: elukey.May 4 2017, 12:56 PM
aaron closed this task as Resolved.EditedMay 24 2017, 6:33 PM

So, having looked at this for a while, I think it's doable, but less desirable than mcrouter. I don't really see it having any edge here (https://github.com/Netflix/dynomite/wiki/FAQ is not terribly convincing either).

mcrouter (pros):

  • Well maintained (solid commit rate on github)
  • Decently documented at this point (on github wiki)
  • Battle tested for performance/stability on a *very* high traffic website
  • Highly configurable. We can do things like a) only replicated purges, b) replicate purges to the target (hash) server or all servers in the pool for robustness, c) replicate everything, including cache-aside writes (CAS/ADD for WANCache), d) do warm-up logic, and d) do different things for different pools. I'd like for WAN cache to just replicate purges (SET/DELETE), but if we want something that replicates incr/add/cas we can do so. We can also have a BagOStuff class that replicates set() (actual SET in that case).
  • If we run it on the maintenance/hhvm hosts, it can replace twemproxy, keeping the stack simpler (when used with memcached)
  • We already have a wmf package

mcrouter (cons):

  • Code is complex with lots of dependencies (folly, wangle, ect..)
    • OTOH: Already packaged and probably not a big deal as long as it is stable (which it appears to be given it's main user and load abilities)
  • Cannot talk directly to redis
    • OTOH: We can always have cache server local twemproxy (or even dyomite technically) instances that act as a brocker to mcrouter by speaking memcached ASCII protocol and turning that into commands to the local redis server. This is not more complex than twemproxy => dynomite => redis. In any case, redis support is not even relevant to the WAN cache (memcached is good enough), but it could be useful for sessions perhaps.

dynomite (pros):

  • Natively talks to redis
  • Simpler codebase
  • Far less dependencies (libtool autoconf automake libssl-dev)

dynomite (cons):

  • Lower commit rate on github
  • Focused on redis (memcached isn't really used by Netflix, so support will be worse)
  • Not all basic redis commands work yet anyway (https://github.com/Netflix/dynomite/issues/49)
  • Worse documentation, even for basic things like PEM and entropy files (which segfault/error out the service unless configured properly). The conf/ dir in the repo provides some dummy PEM and tokens (which can be moved to /etc/ and configured for use via some undocumented settings in the YAML file)
  • No wmf package yet (not a huge deal though)
  • Runs locally on the cache server, so we'd still need temproxy

mcrouter speeds seem comparably to twemproxy on the labs "tin" host:

> aaron@deployment-tin:~$ mwscript eval.php enwiki
> 

> $cmr = ObjectCache::newFromParams( [ 'class' => 'MemcachedPeclBagOStuff', 'servers' => [ '127.0.0.1:11213' ], 'persistent' => false ] );

> $ctp = ObjectCache::getLocalClusterInstance();

> $fs = function ( $c ) { $bad = 0; $t = microtime(true); for ( $i=0; $i<5000; ++$i ) { $bad += (int)!$c->set( "key$i", 1, 60 );} var_dump( microtime(true) - $t, $bad ); }

> $fg = function ( $c ) { $bad = 0; $t = microtime(true); for ( $i=0; $i<5000; ++$i ) { $bad += (int)!$c->get( "key$i" ); } var_dump( microtime(true) - $t, $bad ); }

> echo "mcrouter (SET) [sec, failures]\n"; $fs($cmr); // mcrouter => memcached
mcrouter (SET) [sec, failures]
float(6.9841389656067)
int(0)

> echo "twemproxy (SET) [sec, failures]\n";$fs($ctp); // temproxy => memcached
twemproxy (SET) [sec, failures]
float(6.8755619525909)
int(0)

> echo "mcrouter (GET) [sec, failures]\n"; $fg($cmr); // mcrouter => memcached
mcrouter (GET) [sec, failures]
float(6.0539009571075)
int(0)

> echo "twemproxy (GET) [sec, failures]\n";$fg($ctp); // temproxy => memcached
twemproxy (GET) [sec, failures]
float(6.2721688747406)
int(0)

Above test used the following mcrouter config:

{
    "pools": {
        "main": {
            "servers": [ "10.68.23.25:11211", "10.68.23.49:11211" ]
        }
    },
    "route": {
        "type": "OperationSelectorRoute",
        "default_policy": "PoolRoute|main",
        "operation_policies": {
            "set": {
                "type": "AllFastestRoute",
                "children": [ "PoolRoute|main" ]
            },
            "delete": {
                "type": "AllFastestRoute",
                "children": [ "PoolRoute|main" ]
            }
        }
    }
}

We'd probably use AllFastestRoute or AllAsyncRoute. The former is useful for making it unlikely that a later get() on the same request gives the old value (the local PoolRoute server will generally respond before a remote (35ms) PoolRoute one), whereas the later might be slightly faster.

Basically mcrouter seems like it can do whatever we we'd want to do with dynomite and more, and is more "ready to go" at this point.