Page MenuHomePhabricator

Deploy S3 plugin on all Search team-managed Elastic hosts
Closed, ResolvedPublic

Description

Before we can restore our lost index in cloudelastic ( T309648 ) , we need to deploy the S3 Elastic plugin across all Search team-managed Elastic hosts.

That includes the following groups of hosts: cloudelastic, prod-codfw, and prod-eqiad ("cluster" is maybe not the right word, since each host is running at least 2 instances of ES, each of which belongs to its own cluster).

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2022-06-01T21:33:38Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-01T21:33:58Z] <bking@cumin1001> END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-01T22:13:23Z] <ryankemper> T309720 Downtimed cloudelastic until Monday while we perform maintenance across the next couple days (will manually lift downtime later)

Mentioned in SAL (#wikimedia-operations) [2022-06-02T05:15:46Z] <ryankemper> T309720 Finished manual rolling restart of cloudelastic cluster to get new S3 plugin operational

Mentioned in SAL (#wikimedia-operations) [2022-06-02T14:59:51Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-02T16:15:28Z] <bking@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-02T16:33:33Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-02T17:39:46Z] <bking@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-02T19:10:01Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-03T01:12:02Z] <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-03T16:11:45Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: restart to enable S3 plugin - bking@cumin1001 - T309720

Mentioned in SAL (#wikimedia-operations) [2022-06-03T21:36:31Z] <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: restart to enable S3 plugin - bking@cumin1001 - T309720

Change 803300 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] Elastic: Add elastic bindir to root's path

https://gerrit.wikimedia.org/r/803300

bking changed the task status from Open to In Progress.Jun 6 2022, 3:51 PM
bking triaged this task as High priority.

Change 803300 merged by Bking:

[operations/puppet@production] Elastic: Add elastic bindir to root's path

https://gerrit.wikimedia.org/r/803300

Change 803321 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/software/elasticsearch/plugins@master] Revert "Revert "Upgrade to elasticsearch 7.10.2""

https://gerrit.wikimedia.org/r/803321

Change 803321 abandoned by Ryan Kemper:

[operations/software/elasticsearch/plugins@master] Revert "Revert "Upgrade to elasticsearch 7.10.2""

Reason:

made redundant by Ic76dfce2084ba9e6d0f77510d40d074fba2b88f6

https://gerrit.wikimedia.org/r/803321