Page MenuHomePhabricator

Add reimage to Elastic rolling-operation cookbook
Open, Needs TriagePublic

Description

To ease the upcoming Bullseye upgrade, add the sre.hosts.reimage feature to the Elastic rolling-operation cookbook.

Event Timeline

Change 792719 had a related patch set uploaded (by Bking; author: Bking):

[operations/cookbooks@master] elastic: add reimage to rolling-operation

https://gerrit.wikimedia.org/r/792719

Change 792719 merged by Ryan Kemper:

[operations/cookbooks@master] elastic: add reimage to rolling-operation

https://gerrit.wikimedia.org/r/792719

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:21:21Z] <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:22:15Z] <ryankemper@cumin1001> END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:24:01Z] <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:24:08Z] <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:24:34Z] <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:24:50Z] <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Change 798973 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] elastic: rolling reimage is missing req os arg

https://gerrit.wikimedia.org/r/798973

Change 798973 merged by Ryan Kemper:

[operations/cookbooks@master] elastic: rolling reimage is missing req os arg

https://gerrit.wikimedia.org/r/798973

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:30:16Z] <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:31:03Z] <ryankemper@cumin1001> END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Mentioned in SAL (#wikimedia-operations) [2022-05-24T19:43:11Z] <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin1001 for host relforge1003.eqiad.wmnet with OS bullseye

Change 798981 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] elastic: log return value of reimage cookbook

https://gerrit.wikimedia.org/r/798981

Change 798981 merged by Ryan Kemper:

[operations/cookbooks@master] elastic: log return value of reimage cookbook

https://gerrit.wikimedia.org/r/798981

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin1001 for host relforge1003.eqiad.wmnet with OS bullseye completed:

  • relforge1003 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202205241949_ryankemper_1411783_relforge1003.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-05-24T21:34:11Z] <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reimage - ryankemper@cumin1001 - T308606

Change 800233 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] elasticsearch: add more reimage usage examples

https://gerrit.wikimedia.org/r/800233

Change 800233 merged by Ryan Kemper:

[operations/cookbooks@master] elasticsearch: add more reimage usage examples

https://gerrit.wikimedia.org/r/800233

Change 800244 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/cookbooks@master] elasticsearch: add ANSI color codes

https://gerrit.wikimedia.org/r/800244

Change 800244 abandoned by Ryan Kemper:

[operations/cookbooks@master] elasticsearch: add ANSI color codes

Reason:

not needed

https://gerrit.wikimedia.org/r/800244