Page MenuHomePhabricator

Emergency response to logstash being backlogged
Open, Needs TriagePublic


During the row D blip and D2 failure logstash developed a bit of backlog due to a large amount of logs being generated. In such occasions it might be needed to clear the backlog so that recent logs can be seen again to e.g. debug still ongoing issues, see also incident report

Outline of what is needed:

  • Spicerack cookbook to stop logstash + reset kafka offsets (for a subset of topics if desired) + start logstash
  • The above is the bare minimum, which would work but has the downside of effectively losing the backlog, thus as an improvement the previous and new kafka offsets should be persisted somewhere. Then we can go back later and replay the topics from/to those offsets to backfill the backlog (assuming this happens before the kafka retention expires)

Event Timeline