Page MenuHomePhabricator

Document some etcd cluster operations for Toolforge
Open, NormalPublic


During the kubernetes outage incident
One of the problems that came up was a lack of documentation around etcd operations.


  • - Document disaster recovery procedure for the v2 etcd nodes
  • - Document quirks about the existing v2 nodes (such as timeouts) so they are less likely to cloud root-cause analyses
  • - Document adding/removing nodes from the cluster

Event Timeline

Bstorm triaged this task as Normal priority.Sep 12 2019, 6:50 PM
Bstorm created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 12 2019, 6:50 PM
Phamhi added a subscriber: Phamhi.EditedSep 13 2019, 11:10 AM

I have started the documentation which can be found here: