User story: As a search engineer I would like there to be only one copy of a given index in production so I know which one is the live index, and to know that reindexing was succcessful.
From mwmaint1002 you can run this janky command line code to find duplicate indexes:
for cluster in search.svc.eqiad.wmnet search.svc.codfw.wmnet cloudelastic.wikimedia.org; do for port in 9243 9443 9643; do echo "$cluster:$port"; curl -s https://$cluster:$port/_cat/indices | perl -pe 's/^\S+\s+\S+\s+(\S+)\s+.*/$1/; s/_(\d+|first)//;' | sort | uniq -c | sort -n | grep -vP "^\s+1\s"; done; done
There are 46 indexes with duplicates across 8 of the nine cluster/port combos, including glent on eqiad and codfw. Most have 2, but many have 3, especially on cloudelastic:9243. It was unclear in the Wednesday Meeting today whether there should be multiple glent indexes or not, so those may be okay. One duplicate at a time during reindexing is probably valid, too.
When the current round of reindexing is done, we can clean up duplicates.
Acceptance Criteria:
- There are no unexpected (glent?) duplicate indexes in the eqiad, codfw, or cloudelastic clusters.
Bonus result:
- A better way to find duplicate indexes than the command line abomination above.