Page MenuHomePhabricator

toolsbeta harbor disk full
Closed, ResolvedPublic

Description

Toolsbet harbor has filled up its disk.

Related Objects

Event Timeline

taavi triaged this task as High priority.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Managed to get it down to 50% by manually cleaning up some images and running garbage collection (took some fiddling to get gc to run because gc needs redis and redis was down because of insufficient space. classic chicken and egg).

root@toolsbeta-harbor-1:/srv/ops/harbor# df -h
Filesystem      Size  Used Avail Use% Mounted on
...
/dev/sdb         40G   19G   19G  50% /srv/ops/harbor/data

By default maintain-harbor handles image cleanup, but maintain-harbor seems to be retaining more images than the storage can handle (matter of configuration, not bug).

What we can do next is:

  1. Either change maintain-harbor configuration to retain lesser number of images (which is what I did by manually cleaning up more images than maintain-harbor is configured to cleanup)
  2. Or increase storage from 40GiB to maybe 100GiB.

For now things should be back to normal. On monday we'll decide which direction to take.

Raymond_Ndibe changed the task status from Open to In Progress.Jul 4 2025, 6:24 PM

The retention rules are disabled in toolsbeta for some reason :/, let's re-enable unless someone was testing something specific:

image.png (514×1 px, 106 KB)

We can also adapt the quotas to match the available space, for example, toolforge has 100G quota right now, if we make it a bit smaller than the limit we might catch the extra space usage making pushes to harbor fail, instead of failing the whole harbor.

The retention rules are disabled in toolsbeta for some reason :/, let's re-enable unless someone was testing something specific:

image.png (514×1 px, 106 KB)

We can also adapt the quotas to match the available space, for example, toolforge has 100G quota right now, if we make it a bit smaller than the limit we might catch the extra space usage making pushes to harbor fail, instead of failing the whole harbor.

I probably forgot to re-enable that for some reason. checking.
Also making a patch to reduce the quota

The retention rules are disabled in toolsbeta for some reason :/, let's re-enable unless someone was testing something specific:

image.png (514×1 px, 106 KB)

We can also adapt the quotas to match the available space, for example, toolforge has 100G quota right now, if we make it a bit smaller than the limit we might catch the extra space usage making pushes to harbor fail, instead of failing the whole harbor.

We should also remember that this happened in the first place because the retain policy is retaining more artifacts than the available space can accommodate.
So while reducing the quota will ensure the whole harbor won't fail again, it won't stop us from having other issues in the future (for example failure to push image because quota has been exceeded, like you accurately pointed out).
We'll still need to decide whether to reduce the number of artifacts being retained, or increase the storage.

We had a chat and decided to start with just expanding the volume to 100G, if that's not enough we'll review the policies :)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-14T14:25:07Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.openstack.quota_increase (T398715)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-14T14:25:13Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T398715)