Page MenuHomePhabricator

elasticsearch: make rolling upgrade proceed to second host after shards are stable
Closed, ResolvedPublic3 Estimated Story Points

Description

Context

During a rolling upgrade, for example from 6.5.4 -> 6.8.23, the cluster will get stuck in yellow cluster status after the first host is upgraded, like so:

{"index":"queries_27012021","shard":3,"primary":false,"current_state":"unassigned","unassigned_info":{"reason":"CLUSTER_RECOVERED","at":"2022-03-21T21:46:37.871Z","last_allocation_status":"no_attempt"},"can_allocate":"no","allocate_explanation":"cannot allocate because allocation is not permitted to any of the nodes","node_allocation_decisions":[{"node_id":"E7e7HF1YTvSql8UdZVrLBQ","node_name":"relforge1003-relforge-eqiad","transport_address":"10.64.5.37:9300","node_attributes":{"hostname":"relforge1003","rack":"A2","fqdn":"relforge1003.eqiad.wmnet","row":"A"},"node_decision":"no","deciders":[{"decider":"same_shard","decision":"NO","explanation":"the shard cannot be allocated to the same node on which a copy of the shard already exists [[queries_27012021][3], node[E7e7HF1YTvSql8UdZVrLBQ], [P], s[STARTED], a[id=4DkiEULDRum86eYAs1T9_g]]"}]},{"node_id":"JYN55FKeSpSEuEqGsMzjIA","node_name":"relforge1004-relforge-eqiad","transport_address":"10.64.21.126:9300","node_attributes":{"hostname":"relforge1004","rack":"B2","row":"B","fqdn":"relforge1004.eqiad.wmnet"},"node_decision":"no","deciders":[{"decider":"node_version","decision":"NO","explanation":"cannot allocate replica shard to a node with version [6.5.4] since this is older than the primary version [6.8.23]"}]}]}

Solution

Following the instructions at See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/rolling-upgrades.html, we can have the cookbook go ahead and proceed to the 2nd host like so:

During a rolling upgrade, primary shards assigned to a node running the new version cannot have their replicas assigned to a node with the old version. The new version might have a different data format that is not understood by the old version.
If it is not possible to assign the replica shards to another node (there is only one upgraded node in the cluster), the replica shards remain unassigned and status stays yellow.
In this case, you can proceed once there are no initializing or relocating shards (check the init and relo columns).
As soon as another node is upgraded, the replicas can be assigned and the status will change to green.

This means that as a special case, after the first host is rebooted, instead of waiting for the cluster to become green, we want to implement the above logic instead. Subsequent hosts will follow the normal protocol (waiting for green before proceeding)

Blocker for

T295666

Event Timeline

Change 776999 had a related patch set uploaded (by Ryan Kemper; author: Bking):

[operations/software/spicerack@master] elastic: don't wait for green on first node

https://gerrit.wikimedia.org/r/776999

Change 778335 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] elastic: allow waiting for yellow instead of green

https://gerrit.wikimedia.org/r/778335

Change 776999 merged by jenkins-bot:

[operations/software/spicerack@master] elastic: don't wait for green on first node

https://gerrit.wikimedia.org/r/776999

Change 778335 merged by Bking:

[operations/cookbooks@master] elastic: allow waiting for yellow instead of green

https://gerrit.wikimedia.org/r/778335

Gehel claimed this task.