Page MenuHomePhabricator

zookeeper evaluation
Closed, ResolvedPublic

Description

The cluster is set up on zk{1,2,3}.eqiad.wmflabs

Event Timeline

Joe claimed this task.
Joe raised the priority of this task from to Medium.
Joe updated the task description. (Show Details)
Joe added a project: acl*sre-team.
Joe added subscribers: akosiaris, fgiunchedi, mobrovac and 3 others.

As dynamic reconfiguration is only supported in ZK 3.5, we need to use that, and jessie only has 3.4.5 at the moment.

Zookeeper is a solid, proven (https://aphyr.com/posts/291-call-me-maybe-zookeeper) distributed k-v with solid performance and reliability.

Pros:

  • It's backed by a large community and different companies contribute to its development
  • It's a de-facto off-the-shelf standard choice for this task
  • It has quality libraries in most languages

Cons:

  • it's Java
  • Imagine that the preceding point has been repeated 5 times
  • It uses a purpose-built protocol, so interacting with it via telnet/curl is not possible
  • It shows its age, a few things are clearly better organized in more modern systems
  • If libraries aren't high-level enough (like curator or kazoo) the developer has to deal with all the little nuances and the disconnection/error/retry logic, or get in serious trouble
  • No multi-Dc support built in, and even using the observer nodes switching over from one datacenter is going to be painful (and ZK lacks a backup/recovery tool)

So if we want a *really* boring technology, we should pick ZK.

Joe set Security to None.