Yesterday's incident (T254491) erased our cumin keys. Fortunately, the active keys are still cached in keyholder, and seem to be active on most VM hosts. We need to generate new keys without losing contact with VMs in the meantime:
[] disable puppet crons on cloud-cumin-01 and cloud-cumin-02
[] enable puppet fleet-wide with:
sudo cumin --force --timeout 30 -o json "A:all" "puppet agent --enable"
[] wait 30+ minutes
[] assess where puppet is and isn't running with
sudo cumin --force --timeout 30 -o json "A:all" "/usr/local/lib/nagios/plugins/check_puppetrun -w 3600 -c 86400"
[] repeat steps 2-4 as needed
[] create and commit new cumin keys
[] wait 30+ minutes
[] manually run puppet on cloud-cumin-02 and 'keyholder arm' there
** At this point we should have most VMs answering to cloud-cumin-02, and a few stragglers answering to cloud-cumin-01
[] repair eliminate those stragglers, if necessary by copying in the new key by hand
[] run puppet and keyholder arm on cloud-cumin-01