Page MenuHomePhabricator

Lower rate of Nodepool requests to OpenStack API
Closed, ResolvedPublic

Description

Nodepool serializes all queries to OpenStack API in a queue which is processed once every rate seconds. It is currently set at a rate of 6 seconds or 10 queries per minute.

That delay operations such as building or deleting instances. We can experiment lowering it a bit to a rate of 5 seconds or 12 queries per minute. Then even go lower than that if the cloud infrastructure supports it.

Historical context

Self note: git log --format=fuller -p -Grate: modules/nodepool/templates/nodepool.yaml.erb

We had a rate of 1 (60 queries per minute) since instances deletions took a while (T113359 7bcff1d06a00ac0311ec0eb1b625b0fb08bfb315 ) .

In August 2016 we experimented a quota issue that caused Nodepool to spam the OpenStack infrastructure with queries it could not honor. Namely trying to boot new instance when the quota was considered full. The rate was set from 1 to 10 again ( T143016 - a82ffc941aeae8da623199761d8731db8a5f7d2b ).

It is currently at six since 4f499c3257f79037c2f9152519d9c90f55479c49

Thing to watch

During busy hours, Nodepool would effectively send queries at the maximum rate. That might cause issue on the Nova API and or the labvirt nodes if too many instances get spawned/deleted at once.

Event Timeline

I have quickly talked to @chasemp about it. It is best done early in a given week in order to have the infra properly monitored / acted on.

Change 358601 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] nodepool: lower rate of queries from 6 to 5

https://gerrit.wikimedia.org/r/358601

hashar triaged this task as Medium priority.Jun 16 2017, 10:47 AM

Change 358601 merged by Andrew Bogott:
[operations/puppet@production] nodepool: lower rate of queries from 6 to 5

https://gerrit.wikimedia.org/r/358601

Keeping it open for monitoring. The OpenStack API might be struggling with the new rate of requests.

Keeping it open for monitoring. The OpenStack API might be struggling with the new rate of requests.

@hashar thank you for working through this and being diligent. I'm closing this for now. Cheers.

And on July 12th RabbitMQ apparently exploded possibly due to the rate change ( T170492 )