Page MenuHomePhabricator

Cleanup duplicate indices in cloudelastic
Closed, ResolvedPublic


The indices that are supposed to be split between the psi and omega clusters of cloudelastic instead exist on both psi and omega cloudelastic clusters. At some point (during initial deployment?) in cloudelastic the name/port mappings were mixed up and data for psi cluster in prod went to omega in cloudelastic, similarly omega in prod ended up in psi in cloudelastic.

Cleanup all the unreferenced indices left over by the mixed.

Event Timeline

This shows using a single wiki as a example, but this is repeated for all of the wikis that are split between omega and psi. Here acewiki correctly does not exist on 9243 (chi). It should not exist on 9443 (omega), but does exist on cloudelastic:9443. It should exist on 9643 (psi) and does in all clusters.

ebernhardson@mwmaint1002:~$ for port in 9243 9443 9643; do for cluster in search.svc.{eqiad,codfw}.wmnet; do echo $cluster:$port; curl https://$cluster:$port/_cat/indices | awk '/acewiki/ { print $1 }'; done; done

Double checking cluster assignments in mediawiki we get:

ebernhardson@mwmaint1002:~$ mwscript shell.php --wiki=acewiki
Psy Shell v0.10.5 (PHP 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 — cli) by Justin Hileman
>>> (new CirrusSearch\SearchConfig())->getClusterAssignment()->getServerList('cloudelastic')
=> [
       "host" => "localhost",
       "transport" => "Http",
       "port" => 6107,

Making a connection verifies this is psi:

ebernhardson@mwmaint1002:~$ curl localhost:6107
  "name" : "cloudelastic1001-cloudelastic-psi-eqiad",

Referencing mediawiki-config we can see this should be cloudelastic-psi:

'cloudelastic-psi' => [
    [ // forwarded to
        'host' => 'localhost',
        'transport' => 'Http',
        'port' => 6107,

Overall we will need to do the santa thing, make a list and check it twice, before deleting all the unused indices from cloudelastic.

Pondering this, first step should probably be closing rather than deleting the indices. Closed indices can be easily reopened if we start getting errors from CirrusSearch that we closed an active index. Without errors after some reasonable time period the indices can be safely deleted.

Note that T279607 also has some indices to cleanup, it might make sense to address both at the same time.

Change 682189 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Reconcile configured indices with live state

Ran script from above, initial report is found in P15515 with 1655 indices across clusters that don't match the set of indices expected to exist. On cloudelastic I've closed all indices where the script identified another index that is the current live index, such as when the index is on the wrong cluster or is a failed reindex. Watching logstash I don't see anything new complaining, probable that these indices were correctly classified. Will wait till monday to actually delete anything though.

Will do a bit more manual review for the remaining 48 problem indices. Likely i could close them in a scripted fashion like the other 1600, but being that there aren't that many and it's more painful to make mistakes on the prod clusters I've decided on manual review and close for the moment.

Cleared out the remaining 48 indices from prod clusters on friday. Checking the weekend logs I don't see anything particularly suspicious, will go ahead and delete all the closed indices. Separately we seem to be sending archive deletes to cloudelastic, even though cloudelastic doesn't have archive indices. These result in job queue failures and logs that we should clean up, even if there are no direct negative effects of invalid deletes.

The main purpose of this task is complete, the indices are cleaned up. Still need to finish code review on the scripts that were used so they are available in the future. Also need to update wikitech, i think there are a few hacky versions of finding stale indices in there.

Change 682189 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Reconcile configured indices with live state

Updated wikitech, dropping the section on clearing out duplicate titlesuggest indices and updating the section on removing duplicate indices. This should now be complete.