Page MenuHomePhabricator

Add new cloud-cumin keys
Closed, ResolvedPublic

Description

Yesterday's incident (T254491) erased our cumin keys. Fortunately, the active keys are still cached in keyholder, and seem to be active on most VM hosts. We need to generate new keys without losing contact with VMs in the meantime:

    1. disable puppet crons on cloud-cumin-01 and cloud-cumin-02
    2. enable puppet fleet-wide with: sudo cumin --force --timeout 30 -o json "A:all" "puppet agent --enable"
    3. wait 30+ minutes
    4. assess where puppet is and isn't running with sudo cumin --force --timeout 30 -o json "A:all" "/usr/local/lib/nagios/plugins/check_puppetrun -w 3600 -c 86400"
    5. repair puppet failures that are straightforward to repair
    6. repeat steps 2-4 as needed
    7. create and commit new cumin keys
    8. wait 30+ minutes
    9. manually run puppet on cloud-cumin-02 and 'keyholder arm' there
  • At this point we should have most VMs answering to cloud-cumin-02, and a few stragglers answering to cloud-cumin-01
    1. eliminate those stragglers, if necessary by copying in the new key by hand
    2. run puppet and keyholder arm on cloud-cumin-01

Event Timeline

Andrew updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-cloud) [2020-06-05T15:08:57Z] <andrewbogott> trying to re-enable puppet without losing cumin contact, as per https://phabricator.wikimedia.org/T254589

Change 602755 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[labs/private@master] Replace cumin public key for cloud VMs

https://gerrit.wikimedia.org/r/602755

Change 602755 merged by Andrew Bogott:
[labs/private@master] Replace cumin public key for cloud VMs

https://gerrit.wikimedia.org/r/602755

I'm at step 9, taking a pause because I need to do some other things yet this afternoon.

Change 602788 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[labs/private@master] Add an even newer cloud-vps cumin key

https://gerrit.wikimedia.org/r/602788

Change 602788 merged by Andrew Bogott:
[labs/private@master] Add an even newer cloud-vps cumin key

https://gerrit.wikimedia.org/r/602788

Andrew claimed this task.

cumin now has access to 736/738 instances with the new key, so I'm calling this done