Page MenuHomePhabricator

Archiva's disk partiton space is getting filled up
Closed, ResolvedPublic

Assigned To
Authored By
elukey
Mar 20 2022, 8:34 AM
Referenced Files
F35022235: image.png
Mar 24 2022, 11:57 AM
F35013984: image.png
Mar 21 2022, 12:03 PM
F35013887: image.png
Mar 21 2022, 10:07 AM
F35013885: image.png
Mar 21 2022, 10:07 AM

Description

The archiva1002's VM disk partition space is getting full:

elukey@archiva1002:/var/lib/archiva/repositories$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        94G   82G  7.3G  92% /
elukey@archiva1002:/var/lib/archiva/repositories$ sudo du -hs * | sort -h
68K	internal
1.9M	mirror-spark
2.6M	analytics-old-uploads
809M	python
2.4G	mirror-cloudera
7.4G	snapshots
12G	mirror-maven-central
46G	releases

It doesn't seem a sudden change, but more a steady growth over time. We should probably either add another disk to the VM (or expand the current one) or clean up unused jars/artifacts.

Event Timeline

Yes, I see. Thanks @elukey

We should probably either add another disk to the VM (or expand the current one) or clean up unused jars/artifacts.

My first instinct would be to grow the disk, but it seems that the current guidance is to avoid ding this with Ganeti if at all possible.
https://wikitech.wikimedia.org/wiki/Ganeti#Adding_disk_space

Adding space to an existing disk is possible. But it is not advisable. Only do so sparingly and when you know it's the best course of action

We could add another disk for /var/lib/archiva but then we have to change the partman receipe as well, which is fine, I suppose. We can probably exclude this from being formatted to ease with reimaging and upgrades later on.

However, I wonder whether we should just delete some old artifacts.

There are three specific artifacts that are using the majority of the space:

image.png (140×606 px, 14 KB)

image.png (133×609 px, 12 KB)

If we delete most of the old versions from these repositories, I think that the space will be OK for the time being.

I haven't found any way of doing this automatically at the moment though.

@BTullis in the past I've done it via the Archiva UI, if you are an admin (and you should be given your LDAP credentials) you have also the option of dropping artifacts. It is a bit tedious IIRC since there is no bulk-delete, but it is effective to clean up space. If old versions of the above can be dropped I think it is a good way to go.

@dcausse could you please check the wikidata artifacts and let us know if we can drop old versions (and if so, up to which one :).

BTullis triaged this task as Medium priority.

I've manually removed all but the last 10 releases from most of the org.wikimedia.analytics.refinery projects, so this has dropped the usage markedly.

btullis@archiva1002:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        94G   73G   17G  82% /

The usage is still just above 80% though, so I think it would be best to do this with the wikidata artifacts as well.
However, I'll wait for approval from @dcausse before proceeding with that.

Nice! Something worth to follow up is the Archiva retention rules, in theory we should have some auto-clean up of old artifacts, maybe there is some misconfig? (Or quite possibly we have never set it up :D)

Something worth to follow up is the Archiva retention rules, in theory we should have some auto-clean up of old artifacts

Yes this is interesting. I've started looking into this, but I think that I don't yet understand the concept of releases and snapshots in Archiva.
We have a releases repository, which doies not have snapshots enabled (and also vice-cersa, we have a snapshots repository without releases).
The options to prune repositories automatically by date or by number only seem to refer to pruning snapshots. https://archiva.wikimedia.org/#repositorylist
I'll try to find some more information about these features and the recommended workflows.

image.png (593×777 px, 47 KB)

I've manually removed all but the last 10 releases from most of the org.wikimedia.analytics.refinery

We should check to see which ones are still referenced. These artifacts are added to refinery/artifacts and deployed to targets via git-fat rsync from archiva.wikimemdia.org. If the files are in the refinery repo, they may be pulled from archiva, and if the artifacts are missing there, the deploy will fail.

Additionally, which old version of these jars are used is mostly in puppet. A search for references to refinery-(core|hive|spark|cassandra|tools) should show the versions that are referenced.
There also might be versioned referenced in refinery oozie/ directory, and now possibly in airflow-dags repository too.

Any old version not used by a running job can be deleted. But, if we delete a old version that an current job uses, it may fail if we cannot deploy and restart the job.

I am restoring the deleted files on archiva1002 in order to fix issues that affect deployed oozie jobs.

3,086 files selected to be restored.

Run Restore job
JobName:         RestoreFiles
Bootstrap:       /var/lib/bacula/backup1001.eqiad.wmnet.restore.3.bsr
Where:           /var/tmp/bacula-restores
Replace:         Always
FileSet:         root
Backup Client:   archiva1002.wikimedia.org-fd
Restore Client:  archiva1002.wikimedia.org-fd
Storage:         backup1001-FileStorageProduction
When:            2022-03-21 13:30:44
Catalog:         production
Priority:        1
Plugin Options:  *None*
OK to run? (yes/mod/no):

Should we consider this usecase for archiva or consider another place to distribute this software?

Hm, up for discussion, but I'd say: use something else for 3rd parties (maven central?) but use archiva for any internal deployments.

I have restored all of the deleted files from the refinery group.
They appear in the UI again and we are back up to 92% of the disk's capacity. :-)
Testing now to see whether the symlinks need recreating.

odimitrijevic raised the priority of this task from Medium to High.Mar 23 2022, 4:34 AM
odimitrijevic moved this task from Incoming (new tickets) to Ops Week on the Data-Engineering board.

For other wdqs related artifacts (they're mostly for internal use) https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/blazegraph, https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/rdf-spark-tools, https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/streaming-updater-producer, https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/tools and https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/streaming-updater-consumer

everything < 0.3.80

This is also done now. Space used on / is down to 85%

btullis@archiva1002:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        94G   76G   14G  85% /

I notice that you didn't include https://archiva.wikimedia.org/#artifact~releases/org.wikidata.query.rdf/blazegraph-service in that list of releases to prune @dcausse
Should I drop anything below 0.3.80 in there was well?

I notice that you didn't include https://archiva.wikimedia.org/#artifact~releases/org.wikidata.query.rdf/blazegraph-service in that list of releases to prune @dcausse
Should I drop anything below 0.3.80 in there was well?

I did not include it on purpose because these are the artifacts that are currently exposed and probably linked from third parties so deleting them will break these links.

I did not include it on purpose because these are the artifacts that are currently exposed and probably linked from third parties so deleting them will break these links.

Great, thanks for the clarification. I guess I can close this ticket then, as the usage has dropped below the Icinga warning threshold. It hasn't gained us a lot of headroom though, so I expect to have to come back to this at a later date.

image.png (875×1 px, 86 KB)