I had issue with an instance T133652 that could not reach /dev/vda anymore. Looking at Nodepool it is unable to delete / spawn instances over the OpenStack API.
Seems Keystone / Nova or whatever is deadlocked somehow :(
The first issue in Nodepool logs is at 05:13am UTC
Attempting to spawn an instance times out
2016-04-26 05:13:17,416 ERROR nodepool.NodeLauncher: LaunchStatusException launching node id: 83522 in provider: wmflabs-eqiad error: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nodepool/nodepool.py", line 337, in _run dt = self.launchNode(session) File "/usr/lib/python2.7/dist-packages/nodepool/nodepool.py", line 403, in launchNode server['status'])) LaunchStatusException: Server 882f2ef7-ad9b-4e9f-9e01-86e788a39ed4 for node id: 83522 status: ERROR
Ditto for deletion:
2016-04-26 05:23:22,611 ERROR nodepool.NodeDeleter: Exception deleting node 83522: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nodepool/nodepool.py", line 297, in run self.nodepool._deleteNode(session, node) File "/usr/lib/python2.7/dist-packages/nodepool/nodepool.py", line 2159, in _deleteNode manager.waitForServerDeletion(node.external_id) File "/usr/lib/python2.7/dist-packages/nodepool/provider_manager.py", line 450, in waitForServerDeletion (server_id, self.provider.name)): File "/usr/lib/python2.7/dist-packages/nodepool/nodeutils.py", line 42, in iterate_timeout raise Exception("Timeout waiting for %s" % purpose) Exception: Timeout waiting for server 882f2ef7-ad9b-4e9f-9e01-86e788a39ed4 deletion in wmflabs-eqiad
I also tried to create the instance castor2.integration.eqiad.wmflabs but it never spawn :(