16:52:01 <Lucas_WMDE> I have a feeling that CI builds are “Waiting for the completion of castor-save-workspace-cache” a bit longer than usual lately… would it be possible to give castor more resources? (no idea if that even makes sense tbh) 17:13:30 <•hashar> Lucas_WMDE: the job doesn't run concurrently (I can't remember why) 17:13:39 <•hashar> and it transfers the whole cache of the job, so maybe those have grown 17:13:58 <•hashar> and indeed there are a lot of them : 17:13:59 <•hashar> ) 17:15:20 <•hashar> the caches are stored on integration-castor05.integration.eqiad1.wikimedia.cloud 17:15:33 <•hashar> and `iotop -o` shows rsync using 99% of the available disk io 17:18:02 <•hashar> and if I remember well the data is written to an attached volume 17:19:42 <•hashar> ah found it g3.cores8.ram36.disk20 17:19:57 <•hashar> it lacks the 4 x increase of disk io that other instancess have 17:20:10 <•hashar> so openstack throttles the disk io made to the shared volume / Ceph
The reason is integration-castor05.integration.eqiad1.wikimedia.cloud uses the flavor g3.cores8.ram36.disk20 which has disk io rate limited (which is the default).
The fix is to migrate the instance to a flavor with a raised disk io throttling which have a 4xiops in their name. The available flavors are:
g3.cores8.ram24.disk20.ephemeral90.4xiops |
g3.cores8.ram24.disk20.ephemeral60.4xiops |
g3.cores8.ram24.disk20.ephemeral40.4xiops |
However they have:
- 24G ram when the current one has 36G (I have picked that to benefit the Linux disk cache).
- an ephemeral disk which makes it impossible to change the flavor later on T340825: OpenStack silently fail to resize an Ephemeral volume
So I guess the easiest is to create a duplicate of the currently used flavor with the 4xiops: g3.cores8.ram36.disk20.4xiops. Without an ephemeral disk we can then change the flavor of the instance (which restarts it) and get the new IO throttle.