Page MenuHomePhabricator

Move Archiva to Debian Buster
Closed, ResolvedPublic

Description

Time to move archiva1001 to Debian Buster, and it seems easier to just create a new VM and point the ATS' config to it.

Couple of notes:

  1. From https://archiva.apache.org/download.cgi it seems that Archiva doesn't support Java 11, so we'll have to use 8
  2. Archiva saves its configuration (that is manual) in a file not in puppet, remember that when working on this. See https://wikitech.wikimedia.org/wiki/Archiva#Administration

Details

Show related patches Customize query in gerrit

Event Timeline

Change 596425 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::archiva: move to profile::java::analytics

https://gerrit.wikimedia.org/r/596425

The disk usage is very interesting:

https://grafana.wikimedia.org/d/000000377/host-overview?panelId=12&fullscreen&orgId=1&refresh=5m&var-server=archiva1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc&from=now-6M&to=now

In the past 5 months we increased the /var/lib/archiva usage by 8%, that should be ~8G. Now we are at ~90% usage, broken down in:

elukey@archiva1001:/var/lib/archiva/repositories$ sudo du -hs *
68K	internal
16G	mirrored
665M	python
33G	releases
34G	snapshots

Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1        98G   85G  8.9G  91% /var/lib/archiva

Before creating a new VM, I would check if we can clean up some data or if we need to keep it all.

@Ottomata @JAllemandou thoughts?

Change 596460 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Factor out java 8 installation into java_8 class

https://gerrit.wikimedia.org/r/596460

Change 596460 merged by Ottomata:
[operations/puppet@production] Factor out java 8 installation into java_8 class

https://gerrit.wikimedia.org/r/596460

Change 596425 merged by Elukey:
[operations/puppet@production] role::archiva: move to profile::java::analytics

https://gerrit.wikimedia.org/r/596425

Some info about the move from meitnerium to archiva1001 for Stretch: https://phabricator.wikimedia.org/T192639

Change 603809 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add piwik overrides for matomo1002 to ease testing

https://gerrit.wikimedia.org/r/603809

Change 603809 merged by Elukey:
[operations/puppet@production] Add piwik overrides for matomo1002 to ease testing

https://gerrit.wikimedia.org/r/603809

Change 604066 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva: move archiva-gitfat-link to systemd timer

https://gerrit.wikimedia.org/r/604066

Change 604066 merged by Elukey:
[operations/puppet@production] archiva: move archiva-gitfat-link to systemd timer

https://gerrit.wikimedia.org/r/604066

Some notes from the other migration:

  • Remember to turn on profile::archiva::proxy::only_localhost when applying the role archiva to archiva1002, so the new installation will be protected from any Internet "visitor". When archiva is installed it asks to set the admin password.
  • Remember to set do_acme: false to avoid archiva1002 to challenge the ACME server and acquire the TLS let's encrypt certificate for archiva.wikimedia.org (we are not ready at this step).
  • Apply role(archiva) to archiva1002, and use bacula to pull the latest data to /var/lib/archiva

Change 604691 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Assign role archiva to archiva1002

https://gerrit.wikimedia.org/r/604691

Change 604698 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva::proxy: raise TLS ciphersuite requirements

https://gerrit.wikimedia.org/r/604698

Change 604691 merged by Elukey:
[operations/puppet@production] Assign role archiva to archiva1002

https://gerrit.wikimedia.org/r/604691

Change 604734 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Add archiva-new.wikimedia.org as CNAME to archiva1002

https://gerrit.wikimedia.org/r/604734

Change 605136 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/homer/public@master] Add archiva1002 IPs to analytics-in4/6 filters

https://gerrit.wikimedia.org/r/605136

Change 604734 merged by Elukey:
[operations/dns@master] Add archiva-new.wikimedia.org as CNAME to archiva1002

https://gerrit.wikimedia.org/r/604734

Change 605136 merged by Elukey:
[operations/homer/public@master] Add archiva1002 IPs to analytics-in4/6 filters

https://gerrit.wikimedia.org/r/605136

Change 605186 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva: assign archiva-new.wikimedia.org to archiva1002

https://gerrit.wikimedia.org/r/605186

Change 605186 merged by Elukey:
[operations/puppet@production] archiva: assign archiva-new.wikimedia.org to archiva1002

https://gerrit.wikimedia.org/r/605186

Change 605187 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add archiva-new configuration for Acme Chief

https://gerrit.wikimedia.org/r/605187

Change 605187 merged by Elukey:
[operations/puppet@production] Add archiva-new configuration for Acme Chief

https://gerrit.wikimedia.org/r/605187

Change 605203 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva::proxy: use certificate_name for monitoring

https://gerrit.wikimedia.org/r/605203

Change 605203 merged by Elukey:
[operations/puppet@production] profile::archiva::proxy: use certificate_name for monitoring

https://gerrit.wikimedia.org/r/605203

Change 607989 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Move archiva.wikimedia.org from archiva1001 to archiva1002

https://gerrit.wikimedia.org/r/607989

Change 607989 merged by Elukey:
[operations/puppet@production] Move archiva.wikimedia.org from archiva1001 to archiva1002

https://gerrit.wikimedia.org/r/c/operations/puppet/ /607989

Change 608308 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Update CNAMEs for archiva

https://gerrit.wikimedia.org/r/c/operations/dns/ /608308

Change 608308 merged by Elukey:
[operations/dns@master] Update CNAMEs for archiva

https://gerrit.wikimedia.org/r/c/operations/dns/ /608308

archiva.wikimedia.org now points to archiva1002, and archiva-old.wikimedia.org points to archiva1001. Will keep the latter around for a couple of days in case a quick rollback is needed.

Some changes:

  • mirrored repository deleted
  • created a proxy repository for maven central
  • created a proxy repository for cloudera
  • created a proxy repository for spark
  • created a repository for analytics old dependencies (either manually uploaded to mirrored in the past or not on central/cloudera anymore
  • created a repo group called "mirrored" with the above repositories.

The changes should be transparent for analytics, but will allow other teams to use proxy repos with more narrow scope.

I think we might be missing some referenced artifacts in refinery.

./hive-jdbc-1.1.0-cdh5.10.0.jar:#$# git-fat 08067db8f8120d408a324159ba981905f041bcfc                96774
./hive-service-1.1.0-cdh5.10.0.jar:#$# git-fat 192b8c280dcd132e6eaa1037e9c655b738925f5a              2067290
./article-recommender/venv-0.0.1.zip:#$# git-fat 167d4c5c0eeb07deb66ca84f61d032b6186605fc             54952360
./article-recommender/venv-0.0.2.zip:#$# git-fat 80e06f209b5a687ab694199ac6ed0d5f9925f1ad             54957315

Deploying refinery is failing because these files are missing.

Lovely :(

./repositories/mirrored/org/apache/hive/hive-jdbc/1.1.0-cdh5.10.0/hive-jdbc-1.1.0-cdh5.10.0.jar.sha1
./repositories/mirrored/org/apache/hive/hive-jdbc/1.1.0-cdh5.10.0/hive-jdbc-1.1.0-cdh5.10.0.jar.md5
./repositories/mirrored/org/apache/hive/hive-jdbc/1.1.0-cdh5.10.0/hive-jdbc-1.1.0-cdh5.10.0.jar

The above files are on archiva1001, so I guess that either we uploaded it manually or the cloudera repository doesn't have it anymore..

Very weird:

elukey@archiva1002:/var/lib/archiva/repositories$ ls mirror-cloudera/org/apache/hive/hive-jdbc
1.1.0-cdh5.15.0  maven-metadata.xml  maven-metadata.xml.md5  maven-metadata.xml.sha

So on archvia1002 we have 5.15, not 5.10 as requested but git-fat. Moreover, the cloudera repo still shows 5.10 available:
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hive/hive-jdbc/1.1.0-cdh5.10.0/

I downloaded the 2 article-recommender venvs from archiva-old and uploaded them to archiva:

Artifacts for 'article-recommender:venv:0.0.1', packaged as 'zip', with no POM Generated, were uploaded and saved on Server side to 'releases' repository.

Artifacts for 'article-recommender:venv:0.0.2', packaged as 'zip', with no POM Generated, were uploaded and saved on Server side to 'releases' repository.

We do indeed need the cdh5.10.0 versions of those hive jars:
https://github.com/wikimedia/analytics-refinery/blob/master/artifacts/hive-cdh5.10.0.README

I downloaded them and uploaded them to the analytics-old-uploads repository:

Artifacts for 'org.apache.hive:hive-service:1.1.0-cdh5.10.0', packaged as 'jar', with no POM Generated, were uploaded and saved on Server side to 'analytics-old-uploads' repository.

Artifacts for 'org.apache.hive:hive-jdbc:1.1.0-cdh5.10.0', packaged as 'jar', with no POM Generated, were uploaded and saved on Server side to 'analytics-old-uploads' repository.

I then forced a run of archiva-gitfat-link and was able to successfully deploy refinery.

Ok I think I know what happened:

  1. I made an rsync of 1001 to 1002, containing the 5.10 artifact in mirrored.
  2. Cleaned up mirrored, and set up separate mirror/proxy repo for central/cloudera/spark. At this point, all mirrored artifacts were gone, waiting to be repopulated on demand when needed.
  3. In refinery we explicitly "link" 5.10, with git-fat. Since nothing has pulled 5.10 in yet, no git-fat symlink, and hence failure to deploy refinery.

@Ottomata I arrived too late, thanks for the fix!

Hm, I spoke too soon, I think the upload didn't quite work in the way I expected? Even though I uploaded all the files, there was a .jar and a -standalone.jar, and it looks like the -standalone.jar took precedence over the regular .jar. will try again.

We had to trash my uploads, and then I re-uploaded ONLY the .jar files we needed, not the extraneous sources or javadocs or poms.

Change 608812 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva::proxy: allow nginx to serve content from repositories

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608812

Summary of my understanding so far:

Possible solutions:

  1. We find why repo groups are slow and we fix it. So far I tried to check all options but didn't find any, seems unlikey.
  2. We use "mirrored-analytics" in our pom.xml. Not really clean since we are trying to move away from this model.
  3. We state all the archiva repositories needed explicitly in refinery's pom.xml (or where else needed) and we move away from single "mirrored" repos.

I'd prefer 3) to be honest, seems cleaner and compatible with https://gerrit.wikimedia.org/r/c/operations/puppet/608812

Change 608872 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Explicit archiva mirrored repositories in pom.xml

https://gerrit.wikimedia.org/r/c/analytics/refinery/source/ /608872

Change 608879 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] maven: remove mirrored repository from main settings

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608879

Change 608879 merged by Elukey:
[operations/puppet@production] maven: remove main /etc/maven/settings.xml

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608879

Change 608872 merged by jenkins-bot:
[analytics/refinery/source@master] Explicit archiva mirrored repositories in pom.xml

https://gerrit.wikimedia.org/r/c/analytics/refinery/source/ /608872

We moved refinery to use specific repos (mirror-cloudera/mirror-spark/mirror-maven-central/analytics-old-uploads) and the build went fine, everything seems finally working. The repo group mirrored is still needed for the Discovery team so we'll leave it there.

Verified on backup1001 that archiva1002 is correctly using bacula:

Terminated Jobs:
 JobId  Level    Files      Bytes   Status   Finished        Name
===================================================================
240718  Full    319,197    24.28 G  OK       30-Jun-20 04:54 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
240865  Incr     34,354    952.2 M  OK       01-Jul-20 04:12 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
240979  Full    376,398    25.84 G  OK       02-Jul-20 02:57 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
241008  Incr      2,032    248.9 M  OK       02-Jul-20 04:09 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
241156  Incr     38,298    926.4 M  OK       03-Jul-20 04:09 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
241297  Incr     20,139    974.9 M  OK       04-Jul-20 04:11 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
241438  Incr     18,633    755.6 M  OK       05-Jul-20 09:38 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
241576  Incr     10,181    753.4 M  OK       06-Jul-20 04:09 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva
242784  Incr     14,228    918.0 M  OK       07-Jul-20 08:00 archiva1002.wikimedia.org-Monthly-1st-Thu-production-var-lib-archiva

Change 610006 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove archiva1001 from puppet

https://gerrit.wikimedia.org/r/610006

Change 610006 merged by Elukey:
[operations/puppet@production] Remove archiva1001 from puppet

https://gerrit.wikimedia.org/r/610006

cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: archiva1001.wikimedia.org

  • archiva1001.wikimedia.org (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed

Tried to run the decom cookbook and got:

elukey@cumin1001:~$ sudo cookbook sre.hosts.decommission -t T252767 archiva1001.wikimedia.org
START - Cookbook sre.hosts.decommission
ATTENTION: destructive action for 1 hosts: archiva1001.wikimedia.org
Are you sure to proceed?
Type "done" to proceed
> done
Looking for matches in puppetmaster1001.eqiad.wmnet:/var/lib/git/operations/puppet
hieradata/common/lvs/interfaces.yaml:      'lvs1013': 'enp4s0f1:208.80.154.167'
hieradata/common/lvs/interfaces.yaml:      'lvs1014': 'enp4s0f0:208.80.154.168'
hieradata/common/lvs/interfaces.yaml:      'lvs1015': 'enp4s0f1:208.80.154.169'
hieradata/role/eqiad/wmcs/openstack/eqiad1/labweb.yaml:        host: 208.80.154.160
hieradata/role/eqiad/wmcs/openstack/eqiad1/labweb.yaml:        host: 208.80.154.160
modules/role/templates/mariadb/grants/production-m5.sql.erb:GRANT SELECT, INSERT, UPDATE, DELETE ON striker.* TO 'striker'@'208.80.154.160'
modules/role/templates/mariadb/grants/production-m5.sql.erb:GRANT ALL ON striker.* TO 'striker_admin'@'208.80.154.160'
modules/role/templates/mariadb/grants/production-m5.sql.erb:GRANT ALL ON `labswiki`.* TO 'wikiadmin'@'208.80.154.160'
modules/role/templates/mariadb/grants/production-m5.sql.erb:GRANT DELETE, INSERT, SELECT, UPDATE ON `labswiki`.* TO 'wikiuser'@'208.80.154.160'
Looking for matches in puppetmaster1001.eqiad.wmnet:/srv/private
Looking for matches in deploy1001.eqiad.wmnet:/srv/mediawiki-staging

The above is interesting since 208.80.154.16 (note the missing zero at the end) is the IP of archiva1001 :D

Change 610011 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Decommission archiva1001

https://gerrit.wikimedia.org/r/610011

Change 610011 merged by Elukey:
[operations/dns@master] Decommission archiva1001

https://gerrit.wikimedia.org/r/610011

elukey moved this task from In Progress to Done on the Analytics-Kanban board.

archiva1001 decommed, task completed!

Change 659236 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] archiva: disable nginx proxy buffering

https://gerrit.wikimedia.org/r/659236

Change 659236 merged by Elukey:
[operations/puppet@production] archiva: disable nginx proxy buffering

https://gerrit.wikimedia.org/r/659236

Mentioned in SAL (#wikimedia-operations) [2021-01-28T11:30:18Z] <elukey> disable nginx proxy buffering on archiva.wikimedia.org for a perf test - T252767

Change 608812 merged by Elukey:
[operations/puppet@production] archiva::proxy: allow nginx to serve content from repositories

https://gerrit.wikimedia.org/r/608812

Change 604698 abandoned by Elukey:

[operations/puppet@production] archiva::proxy: raise TLS ciphersuite requirements

Reason:

https://gerrit.wikimedia.org/r/604698