Investigate why nodepool keeps leaking instances and why it stops for no reason sometimes
Closed, ResolvedPublic


On 03/03/17 nodepool stopped working, which is most likely node pool stopped working, but could have been caused by it keeping leaking instances.

All these issues may be bugs which have been fixed in a newer release. We are using a very ancient version of nodepool.

Paladox created this task.Mar 3 2017, 4:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 3 2017, 4:11 PM
Paladox triaged this task as High priority.Mar 3 2017, 4:11 PM
chasemp assigned this task to Andrew.Mar 3 2017, 4:27 PM

we merged causing nova services to restart and a host of in-flight instances to go error and some labvirts are coming back slowly. Hopefully, it's all transient. @Andrew is babysitting this now to ensure.e

Paladox raised the priority of this task from High to Unbreak Now!.Mar 3 2017, 5:23 PM

Guessing unbreak as ci is down?

Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptMar 3 2017, 5:23 PM

Mentioned in SAL (#wikimedia-operations) [2017-03-03T17:34:59Z] <hashar> CI is mostly recovered. It could not spawn instance anymore. The queue is being processed and will take a while to be completed. Check status on | T159543

Paladox lowered the priority of this task from Unbreak Now! to High.Mar 3 2017, 6:14 PM
hashar closed this task as Resolved.Mar 3 2017, 11:00 PM

Nova / OpenStack recovered. Thus instances managed to get deleted and Nodepool has then been able to refill the pool with fresh instances.