Toolsbet harbor has filled up its disk.
Description
Related Objects
- Mentioned In
- T396038: toolsbeta paging
Event Timeline
Managed to get it down to 50% by manually cleaning up some images and running garbage collection (took some fiddling to get gc to run because gc needs redis and redis was down because of insufficient space. classic chicken and egg).
root@toolsbeta-harbor-1:/srv/ops/harbor# df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sdb 40G 19G 19G 50% /srv/ops/harbor/data
By default maintain-harbor handles image cleanup, but maintain-harbor seems to be retaining more images than the storage can handle (matter of configuration, not bug).
What we can do next is:
- Either change maintain-harbor configuration to retain lesser number of images (which is what I did by manually cleaning up more images than maintain-harbor is configured to cleanup)
- Or increase storage from 40GiB to maybe 100GiB.
For now things should be back to normal. On monday we'll decide which direction to take.
The retention rules are disabled in toolsbeta for some reason :/, let's re-enable unless someone was testing something specific:
We can also adapt the quotas to match the available space, for example, toolforge has 100G quota right now, if we make it a bit smaller than the limit we might catch the extra space usage making pushes to harbor fail, instead of failing the whole harbor.
I probably forgot to re-enable that for some reason. checking.
Also making a patch to reduce the quota
We should also remember that this happened in the first place because the retain policy is retaining more artifacts than the available space can accommodate.
So while reducing the quota will ensure the whole harbor won't fail again, it won't stop us from having other issues in the future (for example failure to push image because quota has been exceeded, like you accurately pointed out).
We'll still need to decide whether to reduce the number of artifacts being retained, or increase the storage.
raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/875
[maintain-harbor] reduce toolforge project quota
We had a chat and decided to start with just expanding the volume to 100G, if that's not enough we'll review the policies :)
raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/63
[toolsbeta-harbor] expand registry volume size
raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/63
[toolsbeta-harbor] expand registry volume size
Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-14T14:25:07Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.openstack.quota_increase (T398715)
Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-14T14:25:13Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T398715)
raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/875
[maintain-harbor] reduce toolforge project quota
