getting the netbox module in the cookbooks will save steps on decoms and probably reimages and installs (which share many procedures); the caveat is that in decoms it will have to prompt as to the state to transition into (decom or spare).
After several conversations with robh, I think we can start looking at the low hanging fruit. For the record all of these processes are mediated by a dynamic, ever changing checklist.
Wed, Apr 17
exclude esams from console report
robh requests that the status show up in test_netbox_in_puppetdb
Tue, Apr 16
Fri, Apr 12
Thanks for refiguring the checklist :)
Thu, Apr 11
Wed, Apr 10
Okay the only question that seems open in my mind is how does the service map serial to fqdn?
Tue, Apr 9
It'd be neat if teh code written for spicerack for this purpose could be reused somehow.
Mon, Apr 8
Thu, Apr 4
Wed, Apr 3
I suppose the conversation we need is:
Tue, Apr 2
Minor suggestion, perhaps we could increase the alert threshold if operation isn't actually affected at these levels. Quite often kubelet will sit on the alert threshold and flap alerts.
Thu, Mar 28
Well that's convenient. In the wikitech page about resizing it recommends checking the grafana views for this information.
Incidentally this is the strategy we're pursuing anyway. For the time being write operations will take the form of remote execution in the cookbook while the ganeti module will provide information to said cookbooks.
Additional follow-up: THere were numerous OOMs in the log, even though the box has around 20gb of free ram +/- buffers. I'm not sure if there's a service that spikes up that high or if its the slice that's causing the OOM, but an interesting data point.
Just a note, the service was flapping for a while, and I have restarted it on scb1004.
Wed, Mar 27
Mar 22 2019
For the record, the latest patchset was just pending on me testing null values work in puppetdb queries, which I've done now so this should be coming soon.
Mar 21 2019
Mar 19 2019
Upside of this is that python3-ganeti-rapi is already in stretch-backports.
As part of the MakeVM port, it was requested that this be pursued as a stretch goal.
Mar 18 2019
Mar 16 2019
This is deployed and works in production, it needs only to have the timer deployed
Mar 15 2019
The current workaround is to set the environment variables that it checks like SUDO_USER=$USER USER=root cumin ... but I strongly agree with this feature request, as I've made a similar one :)
I don't expect that changes all that often, but I agree that the script could take that into account (there is an API for tose devices, of course). Now that it's in place it should be straight forward to modify.