Every day around 14:30 UTC, Nodepool spawn an instance out of the reference image ci-jessie-wikimedia in order to create a snapshot. The image is really just refreshed by pulling material and rerunning the puppet provision script we use.
The last known good snapshots are:
nodepool image-list
ID | Provider | Image | Hostname | Version | Image ID | Server ID | State | Age (hours) |
---|---|---|---|---|---|---|---|---|
482 | wmflabs-eqiad | ci-jessie-wikimedia | ci-jessie-wikimedia-1455550768 | 1455550768 | 73aab493-67bc-4adb-abec-045f5bc7b7d2 | 760b0b44-b123-4fc6-b793-259ec78c6edb | ready | 173.45 |
483 | wmflabs-eqiad | ci-jessie-wikimedia | ci-jessie-wikimedia-1455552377 | 1455552377 | 97ccc6a0-55f3-4731-a2ad-c3ac4a5a358f | 84efc5c9-28a9-463b-b358-e84d05e92007 | ready | 173.00 |
Which are Monday Feb 15th 16:10 and 16:37 UTC (they are most probably snapshot I have manually forced since they are not at 14:30).
After that date, when Nodepool schedule an image refresh it logs HTTP connections to labnet1002.eqiad.wmnet the Nova API. It successfully:
- boot an instance with a name ci-jessie-wikimedia-<UNIX TIMESTAMP>
- provision it
- I can login on that instance just fine
But after that nodepool is caught looping over and over waiting for something. @Andrew pasted glance server side logs showing the image is somehow not found:
Snippet:
2016-02-22 20:57:14.373 15962 ERROR glance.registry.client.v1.client [req-39755c71-91e9-412a-9de9-463e02d46ac9 nodepoolmanager contintcloud - - -] Registry client request GET /images/54de67d5-621d-4666-9bb4-2b9c5fb62321 raised NotFound
Deleting the instance works fine.