Page MenuHomePhabricator

Nodepool no more refresh snapshot images automatically
Closed, ResolvedPublic

Description

Nodepool no more refresh the snapshot images from the base image 'ci-jessie-wikimedia'. The last update was on 2015-11-23 14:16:10.

2015-12-01 14:14:00,030 INFO nodepool.SnapshotImageUpdater: Creating image id: 379 with hostname ci-jessie-wikimedia-1448979240 for ci-jessie-wikimedia in wmflabs-eqiad
2015-12-01 14:15:20,680 ERROR nodepool.SnapshotImageUpdater: Exception updating image ci-jessie-wikimedia in wmflabs-eqiad:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nodepool/nodepool.py", line 900, in _run
    self.updateImage(session)
  File "/usr/lib/python2.7/dist-packages/nodepool/nodepool.py", line 1003, in updateImage
    image_id=image_id, config_drive=self.image.config_drive)
  File "/usr/lib/python2.7/dist-packages/nodepool/provider_manager.py", line 398, in createServer
    return self.submitTask(CreateServerTask(**create_args))
  File "/usr/lib/python2.7/dist-packages/nodepool/task_manager.py", line 119, in submitTask
    return task.wait()
  File "/usr/lib/python2.7/dist-packages/nodepool/task_manager.py", line 57, in run
    self.done(self.main(client))
  File "/usr/lib/python2.7/dist-packages/nodepool/provider_manager.py", line 116, in main
    server = client.servers.create(**self.args)
  File "/usr/lib/python2.7/dist-packages/novaclient/v2/servers.py", line 900, in create
    **boot_kwargs)
  File "/usr/lib/python2.7/dist-packages/novaclient/v2/servers.py", line 523, in _boot
    return_raw=return_raw, **kwargs)
  File "/usr/lib/python2.7/dist-packages/novaclient/base.py", line 161, in _create
    _resp, body = self.api.client.post(url, body=body)
  File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 453, in post
    return self._cs_request(url, 'POST', **kwargs)
  File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 428, in _cs_request
    resp, body = self._time_request(url, method, **kwargs)
  File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 397, in _time_request
    resp, body = self.request(url, method, **kwargs)
  File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 391, in request
    raise exceptions.from_response(resp, body, url, method)
BadRequest: Can not find requested image (HTTP 400)
2015-12-01 14:15:20,696 INFO nodepool.NodePool: Deleted image id: 379

The Nodepool provider wmflabs-eqiad has:

providers:
  - name: wmflabs-eqiad
    service-type: 'compute'
    service-name: 'nova'
    project-id: 'contintcloud'
    region-name: 'eqiad'
    username: 'nodepoolmanager'
    ...
    images:
      - name: ci-jessie-wikimedia
        # RelEng manually build and upload the image to Glance
        base-image: ci-jessie-wikimedia

The image does show up:

hashar@labnodepool1001:~$ become-nodepool 
nodepool@labnodepool1001:~$ openstack image list --private
+--------------------------------------+--------------------------------+
| ID                                   | Name                           |
+--------------------------------------+--------------------------------+
| 535da6fd-3d87-49b7-8987-044002770dba | ci-jessie-wikimedia-1448296278 |
| 931a1851-5773-4be4-aa5e-c8d01cdb8b52 | ci-jessie-wikimedia            |
| 02e5bace-3da2-4d98-8e4b-f82bd0c1873e | ci-jessie-wikimedia-1448294320 |
+--------------------------------------+--------------------------------+
nodepool@labnodepool1001:~$

Confirming there is no trailing space in the image name

$ openstack image list --private -f yaml
- {ID: !!python/unicode '535da6fd-3d87-49b7-8987-044002770dba', Name: !!python/unicode 'ci-jessie-wikimedia-1448296278'}
- {ID: !!python/unicode '931a1851-5773-4be4-aa5e-c8d01cdb8b52', Name: !!python/unicode 'ci-jessie-wikimedia'}
- {ID: !!python/unicode '02e5bace-3da2-4d98-8e4b-f82bd0c1873e', Name: !!python/unicode 'ci-jessie-wikimedia-1448294320'}
$ nova image-show  ci-jessie-wikimedia
+----------------------+--------------------------------------+
| Property             | Value                                |
+----------------------+--------------------------------------+
| OS-EXT-IMG-SIZE:size | 1126485504                           |
| created              | 2015-11-23T16:30:43Z                 |
| id                   | 931a1851-5773-4be4-aa5e-c8d01cdb8b52 |
| metadata show        | true                                 |
| minDisk              | 0                                    |
| minRam               | 0                                    |
| name                 | ci-jessie-wikimedia                  |
| progress             | 100                                  |
| status               | ACTIVE                               |
| updated              | 2015-11-23T16:30:53Z                 |
+----------------------+--------------------------------------+

Seems I screwed up something last time I created the image? :(

Event Timeline

hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar subscribed.
hashar set Security to None.

We can still manually update the snapshot though:

nodepool@labnodepool1001:~$ nodepool image-update wmflabs-eqiad ci-jessie-wikimedia
2015-12-02 10:55:01,622 INFO nodepool.SnapshotImageUpdater: Creating image id: 380 with hostname ci-jessie-wikimedia-1449053701 for ci-jessie-wikimedia in wmflabs-eqiad
...
2015-12-02 10:58:25,786 INFO nodepool.SnapshotImageUpdater: Image ci-jessie-wikimedia-1449053701 in wmflabs-eqiad is ready
$ openstack image list --private
+--------------------------------------+--------------------------------+
| ID                                   | Name                           |
+--------------------------------------+--------------------------------+
| 32860eec-860c-4b01-b6a3-49c5034f527b | ci-jessie-wikimedia-1449053701 |  <-- new snapshot
| 535da6fd-3d87-49b7-8987-044002770dba | ci-jessie-wikimedia-1448296278 |
| 931a1851-5773-4be4-aa5e-c8d01cdb8b52 | ci-jessie-wikimedia            |
+--------------------------------------+--------------------------------+

Restarted nodepool process on labnodepool1001.eqiad.wmnet

hashar renamed this task from Nodepool to Nodepool no more refresh snapshot images automatically.Dec 10 2015, 9:46 AM
hashar claimed this task.

It created one properly on Dec 0th at 14:00 UTC and deleted the old one (48 hours age)

2015-12-09 14:14:00,027 INFO nodepool.SnapshotImageUpdater: Creating image id: 388 with hostname ci-jessie-wikimedia-1449670440 for ci-jessie-wikimedia in wmflabs-eqiad

2015-12-09 14:16:51,962 INFO nodepool.SnapshotImageUpdater: Image ci-jessie-wikimedia-1449670440 in wmflabs-eqiad is ready
2015-12-09 14:17:00,041 INFO nodepool.NodePool: Deleting image id: 386 which is 47.9952888511 hours old

Not sure what happened.