System load on graphite1003 has [[https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?var-server=graphite1003:9100&var-datasource=eqiad%20prometheus%2Fops&from=1484807212587&to=|gone up significantly]] starting on 2017-01-20 around 22:00.
As a result, the OOM killer did its thing a couple of times, with carbon-cache@c.service being the victim:
```
[Sat Jan 21 04:05:59 2017] Out of memory: Kill process 4879 (carbon-cache) score 62 or sacrifice child
[Sat Jan 21 04:05:59 2017] Killed process 4879 (carbon-cache) total-vm:4217132kB, anon-rss:4126184kB, file-rss:1748kB
```
@Volans and I restarted carbon-cache@c.service by hand [[https://wikitech.wikimedia.org/wiki/Server_Admin_Log#2017-01-21|when that happened]].
We should figure out what's going on with graphite1003's load, and perhaps consider auto-restarting the service in case of failures.