Got an email about puppet failing on one of the canary vms, checeked the project and there's a bunch of instances, looking
Description
Description
Related Objects
Related Objects
- Mentioned In
- T275354: Puppet failures on many canary machines
Event Timeline
Comment Actions
Mentioned in SAL (#wikimedia-cloud) [2021-02-18T08:56:27Z] <dcaro> canary instances seem to be stuck, looking (T275111)
Comment Actions
I was mistaking these canaries with the nova-fullstack tests, these are not leftovers but meant to be up continuously:
Checking why puppet failed on this one.
Comment Actions
It seems to be out of memory and puppet crashes before finishing the run:
dcaro@canary1022-01:~$ free -m total used free shared buff/cache available Mem: 481 348 18 5 115 115 Swap: 0 0 0
There's a process called 'diamond' running that takes most of the memory, will restart the machine but if it happens
again might be worth taking a closer look.
Comment Actions
That seemed to do the trick. Weird that it uses diamond when we use prometheus by default...
Anyhow, will spend more time on it if it happens again.