Page MenuHomePhabricator

[Tracking task] Pair with brian king on various operational tasks
Closed, ResolvedPublic

Description

Just making this meta-task so that I don't forget anything I want to cover. Also this gives us a task to associate actions like the rolling restart to, given that it's not tied to a specific ticket

  • (fri dec 17) get shell access && puppet-merge
  • (mon dec 20) configure pwstore -> ban elastic1043 from cluster -> ssh into mgmt console of elastic1043 and perform a power cycle (it won't do anything because the node is borked); wdqs & wcqs deploys
  • (tues dec 21) elasticsearch rolling restart (if we do eqiad or codfw it might break a dump, which isn't a huge deal, but we may want to just do cloudelastic for that reason)
  • (weds dec 22) Briefly go over the incident documentation process: https://wikitech.wikimedia.org/wiki/Incident_status

No pairing thurs dec 23 (ryan OOO)

Event Timeline

RKemper triaged this task as Medium priority.Dec 17 2021, 10:58 PM
RKemper updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2021-12-17T23:14:41Z] <ryankemper> T297986 Beep boop testing 1 2 3 disregard me

Mentioned in SAL (#wikimedia-operations) [2021-12-22T23:01:31Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T297986

Mentioned in SAL (#wikimedia-operations) [2021-12-23T00:04:14Z] <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T297986