Page MenuHomePhabricator

Elasticsearch index creation fails for new wikis
Closed, ResolvedPublic

Description

See: https://console.cloud.google.com/errors/detail/COzs4uqBqeePvgE

Looks like this job failed for an unknown reason. This is happening on Elasticsearch 7 and also appears to have happened in the past on ES 6.

We notice this seems to have happened for what appears to be 2 consecutive wiki ids. We wish to keep an eye on this. If it's happening for more it may actually be broken for everyone.

Patches

Event Timeline

The root cause behind the index failures is that we have reached our cluster's shard limit:

⧼Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [2000]/[2000] maximum shards open;⧽

Our temporary work around was to increase cluster.max_shards_per_node to 1200 and rerun the indexing jobs for any instances that failed.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/size-your-shards.html#_this_action_would_add_x_total_shards_but_this_cluster_currently_has_yz_maximum_shards_open

A better long-term solution might be to add an additional data node on production.

Steps:

  • Configure MediaWiki and CirrusSearch to start creating indices with a one replica limit
  • Update all existing indices to scale to at most one replica using:
PUT {CLUSTER}/mwdb_*/_settings
{
    "index": {
        "auto_expand_replicas": "0-1"
    }
}
  • Deploy an additional data node on production
Andrew-WMDE renamed this task from ElasticsearchInit failed for a new wiki to Elasticsearch index creation fails for new wikis.Nov 8 2023, 11:55 AM
  • Manually run index creation jobs for instances where it failed

The root cause behind the index failures is that we have reached our cluster's shard limit:

⧼Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [2000]/[2000] maximum shards open;⧽

Our temporary work around was to increase cluster.max_shards_per_node to 1200 and rerun the indexing jobs for any instances that failed.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/size-your-shards.html#_this_action_would_add_x_total_shards_but_this_cluster_currently_has_yz_maximum_shards_open

Reverted cluster.max_shards_per_node back to 1000

Due to critical heap usage I'll be limiting cluster.max_shards_per_node to 800. This is still well above Elastic's recommendation of 640 shards for 32GB of heap. When we run out of shards in the future we can incrementally increase it as long as the heap usage remains within reason. We need to add additional data nodes when we can no longer increase the limit or heap size.

20 shards or fewer per GB of heap memory, see https://www.elastic.co/guide/en/elasticsearch/reference/7.17/size-your-shards.html#shard-count-recommendation
Heap usage should not exceed 85%, see https://www.elastic.co/guide/en/elasticsearch/reference/current/high-jvm-memory-pressure.html

Evelien_WMDE claimed this task.