Page MenuHomePhabricator

[cloudvirt-canary]Canaries are not going through
Closed, ResolvedPublic


Got an email about puppet failing on one of the canary vms, checeked the project and there's a bunch of instances, looking

Event Timeline

dcaro triaged this task as High priority.Feb 18 2021, 8:55 AM
dcaro created this task.

Mentioned in SAL (#wikimedia-cloud) [2021-02-18T08:56:27Z] <dcaro> canary instances seem to be stuck, looking (T275111)

I was mistaking these canaries with the nova-fullstack tests, these are not leftovers but meant to be up continuously:

Checking why puppet failed on this one.

The one that failed is

It seems to be out of memory and puppet crashes before finishing the run:

dcaro@canary1022-01:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:            481         348          18           5         115         115
Swap:             0           0           0

There's a process called 'diamond' running that takes most of the memory, will restart the machine but if it happens
again might be worth taking a closer look.

That seemed to do the trick. Weird that it uses diamond when we use prometheus by default...

Anyhow, will spend more time on it if it happens again.