We have a documented procedure to reindex after an outage, which is quite simple as it is. Still, it would be nice to create a cookbook for it, so that it is available centrally with the other operational procedures
|T203943 Spicerack cookbooks TODO list
|T251149 [epic] Ryan's onboarding to the Search Platform team
|T219507 Create cookbook to reindex into elasticsearch / cirrus
Some potential nice-to-have features, from recent discussions about reindexing problems:
- Catching reindexing failures in general and issuing an alert/warning. Right now the best/only/horrible way is something like grep -ilP "Reindex task was not successful|fail|error|warn" *log.
- Check for multiple indexes: Alert, possibly before doing anything else, if there are multiple indexes for a given wiki. Alert if, after reindexing, there are multiple indexes.
- Alert If the index didn't change from before reindexing to after (the index name is formatted as <wiki>_<timestamp> so it should change).
I've moved this ticket back to "needs triage" so we can discuss it again in light of the recent problems with T274200, and decide if we should make it more of a priority, and maybe consider a different approach than cookbooks (so non-SREs can reindex).
If this ticket should only be for outage recovery, I can create a new ticket to cover the more general reindexing case.