Page MenuHomePhabricator

Failed to create volume on cloud-vps project wikidata-dev
Closed, ResolvedPublic

Description

I was following the guide at https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#Cinder:_Attachable_Block_Storage_for_Cloud_VPS and attempted to create a 20GB volume named product_testing.

When I went to attach it to my instance it failed with a error in a popup I was unable to capture. The action_log of the instance (wb-product-testing) doesn't contain anything.

Checking the volume logs contains:

ID	Message Level	Event Id	User Message	Created At	Guaranteed Until
34a2103d-3bea-46f6-bff2-588448dd375e	ERROR	VOLUME_VOLUME_008_012	create volume from backend storage:Driver failed to create the volume.	2021-05-06T10:46:31.000000	2021-06-05T10:46:31.000000
04e9d663-b5c8-4940-866f-affcb126ca8f	ERROR	VOLUME_VOLUME_001_003	schedule allocate volume:Could not find any available weighted backend.	2021-05-06T10:46:31.000000	2021-06-05T10:46:31.000000
6cddbd6a-cd02-4542-9356-93e019c1a8a7	ERROR	VOLUME_VOLUME_008_012	create volume from backend storage:Driver failed to create the volume.	2021-05-06T10:46:16.000000	2021-06-05T10:46:16.000000
caddfd67-7040-4a1d-b0f0-eac01cf23186	ERROR	VOLUME_VOLUME_008_012	create volume from backend storage:Driver failed to create the volume.	2021-05-06T10:46:01.000000	2021-06-05T10:46:01.000000
Displaying 4 items

Event Timeline

aborrero triaged this task as Medium priority.May 6 2021, 11:09 AM
aborrero added subscribers: Andrew, dcaro.

Mentioned in SAL (#wikimedia-cloud) [2021-05-06T11:14:18Z] <dcaro> restarting cinder-volume on the eqiad control nodes to refresh the ceph libraries (T282109)

This was caused by the cinder-volume service using an old version of the ceph libraries (see T280641).

On kibana the logs were showing:

2021-05-06 10:46:31.699 6509 ERROR cinder.volume.manager cinder.exception.VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Error connecting to ceph cluster.

Restarted the cinder-volume service on all the cloudcontrol nodes (1003, 1004 and 1005) and retried, that worked.