Revise and improve Graphite backfill procedure
As highlighted in the current backfill procedure for Graphite can lose data, therefore we need to at least:

  • Revise the backfill procedure to be more robust in the face of similar failures in the future (e.g. run a full rsync first, then backfill only the gap)
  • Perform validation post-sync / post-backfill to check the number of datapoints across all metric files is roughly in sync between hosts