Page MenuHomePhabricator

Purge old files on Archiva to free some space
Closed, ResolvedPublic

Description

While reviewing what is needed to migrate Archiva to Buster, I noticed the following:

elukey@archiva1001:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1        98G   85G  8.4G  92% /var/lib/archiva
elukey@archiva1001:/var/lib/archiva/repositories$ sudo du -hs *
68K	internal
16G	mirrored
665M	python
34G	releases
34G	snapshots

elukey@archiva1001:/var/lib/archiva/repositories$ sudo du -hs * releases/
68K	internal
16G	mirrored
665M	python
34G	releases
34G	snapshots

elukey@archiva1001:/var/lib/archiva/repositories$ sudo du -hs * snapshots/
68K	internal
16G	mirrored
665M	python
34G	releases
34G	snapshots

Config for Releases:


Config for Snapshots:

Given something like the timestamp below, the config is not really purging anything:

elukey@archiva1001:/var/lib/archiva/repositories/snapshots$ ls -ld ./org/wikimedia/analytics/refinery/job/refinery-job/0.0.14-SNAPSHOT
drwxr-xr-x 2 archiva archiva 4096 Jun 18  2015 ./org/wikimedia/analytics/refinery/job/refinery-job/0.0.14-SNAPSHOT

https://archiva.apache.org/docs/2.2.4/adminguide/repositories.html

I have limited understanding of Archiva but what I'd do is something like the following:

  • set retention days to 0 to disable it (and rely only on the option below)
  • set retention count to something like 3/5
  • test enabling the "delete released snapshot" button to see if any clean up is triggered

Thoughts?

Event Timeline

elukey created this task.Jun 9 2020, 8:34 AM

Sounds good!

Mentioned in SAL (#wikimedia-operations) [2020-06-09T14:00:53Z] <elukey> update release repository's settings on Archiva - T254849

elukey added a comment.EditedJun 9 2020, 3:23 PM

Enabled the repository-purge consumer in the "Repository Scanning" tab, from the documentation it should help in purging data. For some reason I can kick off a manual run of the artifacts scanning, will wait for the hourly one.

elukey added a comment.Jun 9 2020, 5:21 PM

Better but I still see a lot of old files:

elukey@archiva1001:/var/lib/archiva/repositories$ sudo du -hs *
68K	internal
16G	mirrored
665M	python
34G	releases
9.1G	snapshots
elukey closed this task as Resolved.Jun 10 2020, 8:41 AM
elukey claimed this task.

While reading the documentation I discovered that snapshot artifacts are the ones easily deletable via settings, but released ones seem to be droppable only by manual click in the UI. I dropped refinery jars older than version 1.115 and got down to ~55G of usage:

elukey@archiva1001:/var/lib/archiva/repositories$ du -hsc * | sort -h
68K	internal
665M	python
7.4G	snapshots
16G	mirrored
21G	releases
44G	total

Not going to spend more time on this, looks good for the moment.

elukey set Final Story Points to 5.Jun 10 2020, 8:41 AM
elukey reopened this task as Open.Jun 11 2020, 5:48 PM
elukey added a comment.EditedJun 11 2020, 5:55 PM

Of course I made a mistake, namely not checking the artifacts that we explicitly referenced in refinery. I assumed that keeping only from 0.0.115 onward artifacts would be enough and I got greedy freeing space, meanwhile the following are still referenced:

for jar in $(find artifacts/org/wikimedia/analytics/refinery/refinery-*.jar | cut -d "/" -f 6); do git grep $jar | egrep -v '0.0.11[5-9]|0.0.12[0-9]'; done
oozie/cassandra/bundle.properties:refinery_cassandra_jar_path        = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_mediarequest_per_file_daily.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_mediarequest_per_referer_daily.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_mediarequest_per_referer_monthly.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_mediarequest_top_files_daily.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_mediarequest_top_files_monthly.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_pageview_per_article_daily.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_pageview_per_project_daily.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_pageview_per_project_hourly.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_pageview_per_project_monthly.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_pageview_top_articles_daily.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_pageview_top_articles_monthly.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_pageview_top_bycountry_monthly.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_unique_devices_daily.properties:refinery_cassandra_jar_path        = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/coord_unique_devices_monthly.properties:refinery_cassandra_jar_path        = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/historical/pagecounts_per_project.properties:refinery_cassandra_jar_path       = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-0.0.107.jar
oozie/cassandra/bundle.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_mediarequest_per_file_daily.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_mediarequest_per_referer_daily.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_mediarequest_per_referer_monthly.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_mediarequest_top_files_daily.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_mediarequest_top_files_monthly.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_pageview_per_article_daily.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_pageview_per_project_daily.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_pageview_per_project_hourly.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_pageview_per_project_monthly.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_pageview_top_articles_daily.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_pageview_top_articles_monthly.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_pageview_top_bycountry_monthly.properties:refinery_hive_jar_path            = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_unique_devices_daily.properties:refinery_hive_jar_path             = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/coord_unique_devices_monthly.properties:refinery_hive_jar_path             = ${refinery_directory}/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.100.jar
oozie/cassandra/historical/mediarequest/daily/backfill_mediarequest_per_file.hql:-- -Drefinery_hive_jar_path=hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.101.jar \
oozie/cassandra/historical/mediarequest/daily/backfill_mediarequest_per_referer.hql:-- -Drefinery_hive_jar_path=hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.101.jar \
oozie/cassandra/historical/mediarequest/daily/backfill_mediarequest_tops.hql:-- -Drefinery_hive_jar_path=hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.101.jar \
oozie/cassandra/historical/mediarequest/monthly/backfill_mediarequest_per_referer.hql:-- -Drefinery_hive_jar_path=hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.101.jar \
oozie/cassandra/historical/mediarequest/monthly/backfill_mediarequest_tops.hql:-- -Drefinery_hive_jar_path=hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.101.jar \
oozie/mediacounts/load/insert_hourly_mediacounts.hql:ADD JAR ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-hive-0.0.41.jar;
oozie/unique_devices/per_project_family/daily/coordinator.properties:refinery_hive_jar                 = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-hive-0.0.46.jar
oozie/unique_devices/per_project_family/monthly/coordinator.properties:refinery_hive_jar                 = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-hive-0.0.46.jar
oozie/cassandra/daily/pageview_top_articles.hql:--         -d refinery_hive_jar_path=hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.53.jar \
oozie/cassandra/monthly/pageview_top_articles.hql:--         -d refinery_hive_jar_path=hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.53.jar \
oozie/interlanguage/daily/interlanguage_navigation.hql:-- example: ADD JAR hdfs://analytics-hadoop/wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.53.jar;
oozie/apis/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/clickstream/coordinator.properties:spark_app_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/mediawiki/history/check_denormalize/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/mediawiki/history/denormalize/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/mediawiki/history/reduced/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/mobile_apps/session_metrics/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/webrequest/subset/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/wikidata/coeditors_metrics/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.105.jar
oozie/mediawiki/wikitext/current/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.114.jar
oozie/mediawiki/wikitext/history/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.114.jar
oozie/wikidata/json_entity/weekly/coordinator.properties:spark_job_jar                     = ${artifacts_directory}/org/wikimedia/analytics/refinery/refinery-job-0.0.114.jar
oozie/mediawiki/history/reduced/coordinator.properties:#   -Dspark_job_jar='hdfs://analytics-hadoop/user/milimetric/refinery-job-0.0.93.jar' \

We have a couple of options:

  1. we fix the above using an artifact with version >= 0.0.115, cleaning up in the refinery repo the old jars that we don't need. This will avoid any scap/git-fat deployment issue, and it will not require an immediate restart of any job (since the ones running will keep referencing the HDFS version of refinery that they "link" to, containing the old jars needed).
  1. we roll back /var/lib/archiva from Bacula (need to verify but should be possible), restoring the state before the deletes but possibly loosing artifacts uploaded in the past two days (should be something relatively easy to check between us and Search..)

I'd prefer to do 1), even if a little bit more tedious, since it will give us an excuse to purge old jars and verify that our jobs still work with the new versions. I don't see a lot of risks in doing this (and I expect little problems) but the team as a whole should make the call.

Really sloppy mistake, sorry :(

Nuria added a comment.Jun 11 2020, 6:04 PM

There is a 3rd option, right? we get all the older jars we are missing from HDFS and upload them to archiva. In that scenario: 1) no job re-starts are needed 2) no code changes are needed. Does this seem like a viable option?

I wouldn't go for option 2, as the rollback is really making us going backward (less available space, no cleanup etc).
About options 1 and 3 I prefer 1 for two reasons

  • Enforced cleanup (archiva, our code etc
  • Faster/easier deploys of refinery with scap - the reason the deployment of refinery is so slow is because every time it dowloads all the referenced jars from archiva, in every host (last time I did it it took more than 20minutes and there was a failure on a host due to disk full). I really think it's a good idea to move toward retaining less historical jars in refinery.

While I'd happily triple check non-changes between versions of used-code from jobs/jar, in order to mitigate potential errors, I see the breaking potential of updating jar versions, and Id be ok reuploading only the needed ones.

Change 605167 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie: move cassandra config to refinery-cassandra-0.0.115

https://gerrit.wikimedia.org/r/605167

Change 605168 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie: move cassandra config to use refinery-hive-0.0.115.jar

https://gerrit.wikimedia.org/r/605168

Change 605169 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie: use refinery-hive-0.0.115.jar in cassandra hql files

https://gerrit.wikimedia.org/r/605169

Change 605170 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie: update refinery-job jars for mediawiki history coordinators

https://gerrit.wikimedia.org/r/605170

Change 605171 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie: update refinery-hive jar for unique_devices per project fam coordinators

https://gerrit.wikimedia.org/r/605171

Change 605172 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie: update refinery-hive jar in insert_hourly_mediacounts.hql

https://gerrit.wikimedia.org/r/605172

Change 605173 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] oozie: move remaining coordinator properties to use refinery-job 0.0.115

https://gerrit.wikimedia.org/r/605173

Change 605174 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Remove artifacts that are not needed anymore.

https://gerrit.wikimedia.org/r/605174

The above patches are only an idea/proposal if we want to unblock option 1), but no strong opinion :)

The above patches are only an idea/proposal if we want to unblock option 1), but no strong opinion :)

If we decide to use that strategy I'd like to take the time to check for no-diff in code between jar versions updated in the patches.

The above patches are only an idea/proposal if we want to unblock option 1), but no strong opinion :)

If we decide to use that strategy I'd like to take the time to check for no-diff in code between jar versions updated in the patches.

I offer my help/time if needed!

Change 605167 merged by Elukey:
[analytics/refinery@master] oozie: move cassandra config to refinery-cassandra-0.0.115

https://gerrit.wikimedia.org/r/605167

Change 605168 merged by Elukey:
[analytics/refinery@master] oozie: move cassandra config to use refinery-hive-0.0.115.jar

https://gerrit.wikimedia.org/r/605168

Change 605169 merged by Elukey:
[analytics/refinery@master] oozie: use refinery-hive-0.0.115.jar in cassandra hql files

https://gerrit.wikimedia.org/r/605169

Change 605170 merged by Elukey:
[analytics/refinery@master] oozie: update refinery-job jars for mediawiki history coordinators

https://gerrit.wikimedia.org/r/605170

Change 605171 merged by Elukey:
[analytics/refinery@master] oozie: update refinery-hive jar for unique_devices per project fam coordinators

https://gerrit.wikimedia.org/r/605171

Change 605172 merged by Elukey:
[analytics/refinery@master] oozie: update refinery-hive jar in insert_hourly_mediacounts.hql

https://gerrit.wikimedia.org/r/605172

Change 605173 merged by Elukey:
[analytics/refinery@master] oozie: move remaining coordinator properties to use refinery-job 0.0.115

https://gerrit.wikimedia.org/r/605173

Change 605174 merged by Elukey:
[analytics/refinery@master] Remove artifacts that are not needed anymore.

https://gerrit.wikimedia.org/r/605174

fdans triaged this task as High priority.Mon, Jun 15, 3:37 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
elukey added a comment.Thu, Jul 2, 6:55 AM

This is basically done. The jobs to be restarted are listed in https://etherpad.wikimedia.org/p/analytics-weekly-train, we should be able to do it gracefully during the next weeks.

elukey moved this task from Next Up to Done on the Analytics-Kanban board.Thu, Jul 2, 6:56 AM
Nuria closed this task as Resolved.Mon, Jul 6, 5:51 PM