tracking graphite scaling work using this ticket, see also https://wikitech.wikimedia.org/wiki/Graphite/Scaling for a general overview of the options and http://etherpad.wikimedia.org/p/graphitetodo for a scratchpad/ramblings.
currently on the plate:
- get two SSD machines (1x eqiad 1x codfw) https://rt.wikimedia.org/Ticket/Display.html?id=9105
- use carbon-c-relay to mirror metrics to both sites https://gerrit.wikimedia.org/r/181080
- adapt txstatsd to use plaintext/line metric protocol https://gerrit.wikimedia.org/r/180786
- do an initial import of metrics from carbon to new SSD machines
- flip traffic to new machines
- backfill remaining metrics with carbonate
future plans:
- expand the setup beyond 1+1 machine using graphite clustering
- route/cluster metrics with carbon-c-relay and carbon consistent hashing
- use carbonate to rebalance metric data around also using the same consistent hashing
- consider hybrid caching with bcache (i.e. spinning disks + block caching on ssd)