Page MenuHomePhabricator

Rolling operation cookbook: Detect and remove failed index aliases
Open, MediumPublic

Description

Failed reindexes are fairly common in our Elastic environment. While they're not cause for alarm, they do cause our clusters to dip into red status during routine maintenance operations, such as restarts or reboots.

Our rolling-operation cookbook stops when it detects the cluster is red (which is good!) but it requires manual intervention to clean up the failed indices. The cirrussearch extension repo already has a Python script that detects the failed duplicate indices, so let's make use of this into the rolling-operation cookbook.

AC:

  • Rolling operation cookbook detects failed duplicate indices before maintenance operation and prompts user to delete them.

Event Timeline

bking renamed this task from rolling operation: Detect and remove failed index aliases to Rolling operation cookbook: Detect and remove failed index aliases.Sep 1 2023, 4:28 PM
Gehel triaged this task as Medium priority.Sep 6 2023, 8:33 AM
Gehel moved this task from Incoming to Ready for Work on the Data-Platform-SRE board.
Gehel moved this task from Ready for Work to Misc on the Data-Platform-SRE board.