Page MenuHomePhabricator

Migrate titanium to jessie (archiva.wikimedia.org upgrade)
Closed, ResolvedPublic

Description

Still on precise, migrate to jessie. While at it, migrate to a Ganeti VM (meitnerium)

Event Timeline

MoritzMuehlenhoff raised the priority of this task from to Needs Triage.
MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff subscribed.

This hosts archiva.wikimedia.org

https://wikitech.wikimedia.org/wiki/Archiva

This indicates the Analytics team should be involved.

Dzahn renamed this task from Migrate titanium to jessie to Migrate titanium to jessie (archiva.wikimedia.org upgrade).Apr 11 2016, 6:28 PM
Dzahn added a project: Analytics-Clusters.

https://wikitech.wikimedia.org/wiki/Analytics/Archiva

@Analytics Does an upgrade of this server to jessie have blockers that are already known?

I don't think archiva is a runtime dependency on anything — but fyi, @Ottomata.

I'm on this. There's already a replacement VM (meitnerium) with archiva installed, the next is the migration of /var/lib/archiva from the titanium to the new VM.

akosiaris triaged this task as Medium priority.Aug 11 2016, 2:23 PM

Change 307559 had a related patch set uploaded (by Dzahn):
archiva: migration class to rsync data to new host

https://gerrit.wikimedia.org/r/307559

Change 307559 merged by Dzahn:
archiva: migration class to rsync data to new host

https://gerrit.wikimedia.org/r/307559

I have rsynced the entire /var/lib/archiva from titanium over to meitnerium, the new jessie server.

One single file, the conf/archiva.xml i copied back afterwards from a backup, so the config is new, all data is imported

Now we have the data but still get an Error 503 - Service Unavailable from the new server, even though the archiva service is running and can be stopped/started fine from cmdline.

/var/log/archiva/wrapper.log has

6852 INFO | jvm 1 | 2016/08/30 22:19:57 | Caused by:
6853 INFO | jvm 1 | 2016/08/30 22:19:57 | java.io.FileNotFoundException: /var/lib/archiva/data/jcr/.lock (Permission denied)
6854 INFO | jvm 1 | 2016/08/30 22:19:57 | at java.io.RandomAccessFile.open(Native Method)
6855 INFO | jvm 1 | 2016/08/30 22:19:57 | at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)

the issue is caused by archiva user being a different UID on old and new server, which means permissions are messed up even when we preserve the UID.

old: uid=108(archiva) gid=112(archiva) groups=112(archiva)

new: uid=113(archiva) gid=118(archiva) groups=118(archiva)

so we get stuff like:

4.0K drwxr-xr-x 2 _lldpd mlocate 4.0K Mar 28 2014 conf
4.0K drwxr-xr-x 5 _lldpd mlocate 4.0K Mar 26 2014 data

fix running: root@meitnerium:/var/lib# find /var/lib/archiva/ -uid 108 -exec chown archiva:archiva {} \;

fixed, restarted service. i got the Archiva web UI now on meitnerium (when hacking my /etc/resolv.conf to point archiva.wm.org to it).

Ok! DNS has been merged, and we just did our first refinery release using jenkins and archiva. ALL IS WELL!

Nice! Let's keep titanium around for another two weeks just in case we need to track something down that wasn't noticed.

Cool, i'll take it as a reminder to shut titanium down after a waiting period.

Change 307900 had a related patch set uploaded (by Dzahn):
Revert "archiva: migration class to rsync data to new host"

https://gerrit.wikimedia.org/r/307900

Change 307900 merged by Dzahn:
Revert "archiva: migration class to rsync data to new host"

https://gerrit.wikimedia.org/r/307900

Change 310596 had a related patch set uploaded (by Dzahn):
archiva/site: remove titanium, fix role syntax

https://gerrit.wikimedia.org/r/310596

Change 310596 merged by Dzahn:
archiva/site: remove titanium, fix role syntax

https://gerrit.wikimedia.org/r/310596

titanium has been replaced by meitnerium. This is done. The remaining decom steps (up to physically removing it in data center) are handled in a subtask.