Page MenuHomePhabricator

Investigate use of Nodepool ListFloatingIPsTask
Closed, ResolvedPublic

Description

NodePool queries to the OpenStack are represented as task, enqueued and processed one by one at a given rate (eg every 10 seconds).

I have noticed the task ListFloatingIPsTask which is to grab the list of IPs (public?), they are cached internally for 5 seconds and are a major source of API request.

However, we have no floating IP allocated:

$ nova absolute-limits|grep FloatingIps
| FloatingIps        | 0     | 0      |

Internally Nodepool will retrieve them whenever the Nova has the extension os-floating-ips.

We should most probably hack our Nodepool to stop issuing those requests.

Event Timeline

The bit:

IPS_LIST_AGE = 5      # How long to keep a cached copy of the ip list

Got removed when python-shade has been introduced with:

commit e1f4a12949016e57db9b716b00f0f80f90329e74
Author: Monty Taylor <mordred@inaugust.com>
Date:   Tue Sep 22 21:37:54 2015 -0500

    Use shade for all OpenStack interactions
    
    We wrote shade as an extraction of the logic we had in nodepool, and
    have since expanded it to support more clouds. It's time to start
    using it in nodepool, since that will allow us to add more clouds
    and also to handle a wider variety of them.
    
    Making a patch series was too tricky because of the way fakes and
    threading work, so this is everything in one stab.
    
    Depends-On: I557694b3931d81a3524c781ab5dabfb5995557f5
    Change-Id: I423716d619aafb2eca5c1748bc65b38603a97b6a
    Co-Authored-By: James E. Blair <jeblair@linux.vnet.ibm.com>
    Co-Authored-By: David Shrewsbury <shrewsbury.dave@gmail.com>
    Co-Authored-By: Yolanda Robla <yolanda.robla-mota@hpe.com>

That removed the related code, so ultimately we want to get python-shade (T107267) and upgrade Nodepool.

Meanwhile a custom hack would be good enough.

Looking at debug messages over four days:

$ grep 'wmflabs.*running task' /var/log/nodepool/debug.log*|cut -d\  -f9|cut -d\. -f3|sort|uniq -c|sort -rn
   6266 ListServersTask
   3578 DeleteServerTask
   3578 CreateServerTask
   2267 ListFloatingIPsTask
     13 ListFlavorsTask
     13 ListExtensionsTask
      8 AddKeypairTask
      5 ListKeypairsTask
      5 GetServerTask
      5 DeleteKeypairTask
      2 FindImageTask

The code path summary is:

nodepool/provider_manager.py
def cleanupServer()
    if self.hasExtension('os-floating-ips')
        self.listFloatingIPs()
             --> self.submitTask(ListFloatingIPsTask()
        self.deleteFloatingIP()

So whenever the cloud provider supports floating IPs extension, a query is made unconditionally even if the server does not use a public IP. The reason is that Nodepool does not differentiate it got a server with a private IP or one that had a floating IP assigned to it.

Since we do not use floating IP at all. We can just shortcircuit the condition with a False and ....

Change 309406 had a related patch set uploaded (by Hashar):
WMF: stop triggering ListFloatingIPsTask entirely

https://gerrit.wikimedia.org/r/309406

Change 309435 had a related patch set uploaded (by Hashar):
Add patch stop triggering ListFloatingIPsTask entirely

https://gerrit.wikimedia.org/r/309435

Change 309406 merged by Hashar:
WMF: stop triggering ListFloatingIPsTask entirely

https://gerrit.wikimedia.org/r/309406

Change 309435 merged by Hashar:
Add patch stop triggering ListFloatingIPsTask entirely

https://gerrit.wikimedia.org/r/309435

Change 309464 had a related patch set uploaded (by Hashar):
0.1.1-wmf5: gbp.conf / ListFloatingIPsTask

https://gerrit.wikimedia.org/r/309464

Change 309464 merged by Hashar:
0.1.1-wmf5: gbp.conf / ListFloatingIPsTask

https://gerrit.wikimedia.org/r/309464

We will want to upgrade Nodepool to 0.1.1-wmf5

Packages build and published at:

https://people.wikimedia.org/~hashar/debs/nodepool_0.1.1-wmf5/

Nodepool has been upgraded. It should no more query OpenStack for a list of floating IP.

Can be check via https://grafana.wikimedia.org/dashboard/db/nodepool or grep ListFloatingIPsTask /var/log/nodepool/debug.log

Will close this task if all is fine :)

hashar changed the task status from Stalled to Open.Sep 22 2016, 9:12 PM

I have checked on labnet1001 and there are definitely no more requests for /v2/contintcloud/os-floating-ips flowing in.

Out of 3200 queries, 430 were for floating IPs. Dropping them reduces the stress to the OpenStack API and speeds up Nodepool.