Page MenuHomePhabricator

Elasticsearch chi@eqiad cluster contains invalid cross cluster settings
Closed, ResolvedPublic5 Estimated Story Points

Description

Context: The cross cluster settings are used to search across different projects, if they are invalid the impact on users is that the crosslanguage & crossproject search is broken (silently, the widget does not show up). In this instance it was broken between 2022-04-07 18:30 to 2022-04-08 08:30.

The way to configure cross cluster seeds changed between elastic 5 and 6. Previously the key search.remote had to be set, now it must be cluster.remote.

It appears that the cluster chi@eqiad still contains references search.remote in addition to cluster.remote:

{
  "search": {
    "remote": {
      "omega": {
        "seeds": [
          "elastic1034.eqiad.wmnet:9500",
          "elastic1040.eqiad.wmnet:9500",
          "elastic1038.eqiad.wmnet:9500"
        ]
      },
      "psi": {
        "seeds": [
          "elastic1052.eqiad.wmnet:9700",
          "elastic1048.eqiad.wmnet:9700",
          "elastic1050.eqiad.wmnet:9700"
        ]
      }
    }
  },
  "cluster": {
    "remote": {
      "chi": {
        "seeds": [
          "elastic1054.eqiad.wmnet:9300",
          "elastic1074.eqiad.wmnet:9300",
          "elastic1081.eqiad.wmnet:9300"
        ]
      },
      "omega": {
        "seeds": [
          "elastic1068.eqiad.wmnet:9500",
          "elastic1076.eqiad.wmnet:9500",
          "elastic1057.eqiad.wmnet:9500"
        ]
      },
      "psi": {
        "seeds": [
          "elastic1073.eqiad.wmnet:9700",
          "elastic1075.eqiad.wmnet:9700",
          "elastic1083.eqiad.wmnet:9700"
        ]
      }
    },

Unfortunately there does not seem to be a way to delete these settings as neither of:

curl -XPUT -H"Content-Type: application/json" localhost:6102/_cluster/settings -d'{"persistent": {"search.remote": null}}'
curl -XPUT -H"Content-Type: application/json" localhost:6102/_cluster/settings -d'{"persistent": {"search" :{"remote": null}}}'

work as they return persistent setting [search.remote], not recognized.
What works is:

curl -XPUT -H"Content-Type: application/json" localhost:6102/_cluster/settings -d'{"persistent": {"search" :{"remote": {"omega": {"seeds": null}}}}}'

But it seems to hit some BC code and affects the cluster.remote section leaving the search.remote section intact.
Re-adding the seeds to cluster.remote with:

curl -XPUT -H"Content-Type: application/json" localhost:6102/_cluster/settings -d'{"persistent": {"search" :{"remote": {"omega": {"seeds": ["elastic1068.eqiad.wmnet:9500", "elastic1076.eqiad.wmnet:9500", "elastic1057.eqiad.wmnet:9500"]}}}}}'

does seem to force elastic to use these seeds instead of the old ones. There seems to be some dependency on the order in which these sections are loaded such that the old (and un-deletable?) search.remote section can be used as the source for the seeds.

AC:

  • understand what's going on (question upstream)
  • cleanup the search.remote section from the chi@eqiad cluster settings

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2022-04-20T07:49:34Z] <dcausse> T305689: reset crosscluster settings of the elastic chi cluster in eqiad

I could not reproduce such duplicated settings even when doing the following updates: 5.5 -> 6.3 -> 6.3.1 -> 6.4.2 -> 6.5.4.
Testing using the node state taken from the master node (e.g. elastic1054.eqiad.wmnet:/srv/elasticsearch/production-search-eqiad/nodes/0/_state/) I was able to boot elasticsearch v7.10.2 locally without errors, the duplicated settings disappeared and I could update them properly.
I'm going to assume that this won't cause any problem for us and will resolve itself by upgrading to 7.10.