Page MenuHomePhabricator

Upgrade wmf_opensearch_search_plugins .deb and restart opensearch
Closed, ResolvedPublic

Description

Search clusters which have already migrated to opensearch need to install the new package version to pick up the sudachi analyzer added in the parent task.

AC:

  • Ensure 1.3.20-2 package is installed to cloudelastic and relforge
  • Roll a restart on both clusters.

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2025-03-17T21:29:06Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-17T21:30:10Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-17T21:34:29Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-17T21:35:32Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Quick update before I head out for the day:

  • The relforge cluster has been updated. Keep in mind that it's a mixed cluster, so technically only relforge1004 has been updated.
  • We could not use our rolling-operation cookbook on cloudelastic*, but we are happy to restart the cluster manually if this is time-sensitive.
  • We've noticed some weirdness related to sudachi on the cloudelastic cluster, which hopefully can be fixed by this patch .**

*We are in the process of updating our cookbooks for Opensearch, see T383811 for details.

**See scrollback in Wikimedia-Search IRC for more details

We found out that the new plugins package is not 100% ready to go, or at least it's integration with our systems. When attempting to create any shard, even a simple shard move of an index that does not refer to the new sudachi analyzer, it fails as sudachi fails to locate the appropriate dictionary.

The .deb package places the dictionary in /usr/share/opensearch/config/sudachi/system_core.dict, but unfortunately the instances are looking to find that in the per-instance config directories. We will likely need a puppet patch to sym-link it to the appropriate places.

Sorry Sudachi is being difficult! It  ̶b̶e̶t̶t̶e̶r̶ ̶b̶e̶ ̶ will be worth it when it goes live!

Mentioned in SAL (#wikimedia-operations) [2025-03-19T18:28:35Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-19T18:29:44Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-19T19:22:33Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-19T19:23:35Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-19T20:17:15Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-19T20:17:45Z] <bking@cumin2002> END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-19T20:55:33Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-19T20:56:41Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-20T14:08:39Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-20T14:11:00Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-20T14:12:05Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-20T14:35:54Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-20T14:57:20Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-20T15:14:42Z] <bking@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - bking@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-21T21:26:40Z] <ryankemper@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - ryankemper@cumin2002 - T389119

Mentioned in SAL (#wikimedia-operations) [2025-03-21T21:45:06Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - ryankemper@cumin2002 - T389119

Per today's Search Platform standup, we (Data Platform SRE) believe the AC is done here. As such, I am closing out this ticket. Please feel free to reopen if we missed something.