After doing some testing, Erik has a rough recovery plan (from https://phabricator.wikimedia.org/T295478#7501154):
1. Deploy elasticsearch-repository-swift plugin to eqiad and codfw clusters
1. Configure both clusters to connect to ms-fe.svc.eqiad.wmnet (swift)
1. Snapshot the existing commonswiki_file index from the codfw cluster to swift, take note of start time
1. Restore the snapshot from swift to the eqiad cluster.
1. Run CirrusSearch downtime catchup procedure against eqiad for the period between starting restore and the cluster no longer failing writes to the commonswiki index.
Some related notes:
* elasticsearch-repository-swift was never released for 6.5.4, I ended up taking the last commit targeting 6.6.0 and compiling it against 6.5.4 (change elasticsearchVersion = 6.5.4, and change gradle from 5 to 4.1). What process should we follow to include this in the plugins .deb since we are no longer the upstream here?
* Should we have a separate auth setup in swift for cirrussearch snapshots?
* By default snapshot backup/restore is limited to 20MB/s per partition. Since commonswiki is 32 partitions the cluster will limit itself to 640MB/s, or over 5 gigabits/s. I suspect this is a bit excessive for the swift cluster, or at least beyond doubling the typical network traffic. What would a more appropriate limit be? @fgiunchedi
* After or during restore of the snapshot we likely need to manually assign the commonswiki_file and commonswiki aliases to it.