Today I had to manually restart one of the carbon-cache processes on graphite1003 because it was killed by the oom-killer.
Seems that graphite1003 is quite short on memory:
$ free -m total used free shared buffers cached Mem: 64267 64039 228 1347 1 2732 -/+ buffers/cache: 61305 2962 Swap: 255 249 6
Of course the memory us used by all the carbon-cache and uwsgi-graphite-web processes.
From grafana, at 00:03 there was a spike in the swap usage. No other metrics seems to show any spike at that time.