Page MenuHomePhabricator

Update ruthenium to Debian jessie from Ubuntu 12.04
Closed, ResolvedPublic

Description

ruthenium is currently on 12.04. We use it for roundtrip and visual diff testing right now. I wanted to start using upright-diff for running visual diff tests since this provides a more actionable difference metric. However, I couldn't compile it because 12.04 doesn't have the dependencies for it.

What is involved in updating this server to 14.04? We have some custom nginx config, config files for testreduce, visualdiffing, and parsoid on ruthenium. None of these are puppetized. Is that a blocker for updating to 14.04?

Event Timeline

ssastry created this task.Dec 23 2015, 7:30 PM
ssastry updated the task description. (Show Details)
ssastry raised the priority of this task from to Normal.
ssastry added a subscriber: ssastry.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 23 2015, 7:30 PM
ssastry set Security to None.
ssastry renamed this task from Update ruthenium to 14.04 from 12.04 to Update ruthenium to Ubuntu 14.04 from Ubuntu 12.04.Dec 23 2015, 10:04 PM

@ssastry Would it also be ok if we used Debian and re-installed this with jessie?

@ssastry Would it also be ok if we used Debian and re-installed this with jessie?

What perfect timing! :) @mobrovac and I were just discussing this and we figured moving to jessie might be a good idea.

So, I have a few questions for you to help us decide timeline:

  • When would you start this?
  • How long would it take?
  • We have a mysql database in /mnt/data ... I assume that is an external drive and that all that data will be retained with the reimaging?
  • There are couple more puppet pieces to get through -- it can also happen post-imaging since most of it is just finetuning.
  • I have puppetized most of the services assuming upstart, but once you reimage, it will take us a day or so to get some of those migrated for system5

Given that ruthenium is our testing server for parsoid deploys, we should co-ordinate this so that we have a couple days before the next deploy. We were planning to do a deploy on Wednesday (20th), but we could hold off till the following Monday (25th) if necessary.

Once we move to jessie, we can then separately consider a move to node 4.2 (from node 0.10 which we have been testing with and running in production).

@mobrovac tells me that the default nodejs package in jessie is node 4.2 ... so, that is fine.

If we find that we are not ready to migrate to node 4.2 for whatever reason (after testing), we can figure out what is involved in getting node 0.10 there till we resolve any blockers for the node 4.2 move.

  • When would you start this?

I could start next Tuesday (19th) (or later)

  • How long would it take?

The actual install .. hmm.. maybe an hour. But it largely depends on how much data there is that you need to keep and we have to copy around and how many things you have to do manually.

  • We have a mysql database in /mnt/data ... I assume that is an external drive and that all that data will be retained with the reimaging?

I'm not sure yet, i can't tell from T83132 how that was setup. @RobH do you remember?

  • There are couple more puppet pieces to get through -- it can also happen post-imaging since most of it is just finetuning.

That's T118778 ?

  • I have puppetized most of the services assuming upstart, but once you reimage, it will take us a day or so to get some of those migrated for system5

Given that ruthenium is our testing server for parsoid deploys, we should co-ordinate this so that we have a couple days before the next deploy. We were planning to do a deploy on Wednesday (20th), but we could hold off till the following Monday (25th) if necessary.

I think Wednesday would be too soon, since Monday is a US holiday and that would just leave Tuesday to get everything done.

Dzahn renamed this task from Update ruthenium to Ubuntu 14.04 from Ubuntu 12.04 to Update ruthenium to Debian jessie from Ubuntu 12.04.Jan 16 2016, 12:33 AM

@ssastry the data partition on is on a software RAID across 2 physical disks. i'm afraid we'd have to copy all that data elsewhere.

@Dzahn yes, T118778 is the puppetization task .. one of the last pieces to be reviewed and merged is https://gerrit.wikimedia.org/r/#/c/264032/ ... I haven't done the nginx config bit, but i have copied over the config file into ~ssastry/bkp which I can puppetize later. Besides that, I'll have to migrate the upstart scripts to system5 -- I'll probably get @mobrovac's help there. :)

In terms of data migration, besides the mysql dbs, ~ssastry/bkp is the only data that needs saving. If it is simpler, you can do a mysql dump of the 2 databases that are in use (testreduce_0715 and testreduce_vd) instead of all the dbs there. But, maybe it is simpler to save the binary data and restore it. I'll let you make the call on it. Not sure if @GWicke has any data / code in there that he has been using for restbase / services testing.

We already have code that is tested that we can deploy on Wednesday without needing any more testing. So, you can start this on Tuesday which should give us enough time to troubleshoot and get ruthenium back in shape by end of week. Worst case, we'll not deploy on Monday the 25th.

I think my home directory on ruthenium doesn't contain anything that can't be re-created, and I don't have access to this host any more anyway.

Dzahn added a comment.EditedJan 19 2016, 8:09 PM

the mysql server version before: 5.5.46-0ubuntu0.12.04.2 and after it's going to be jessie with 5.5.46-0+deb8u1 so same thing. therefore i'm going to go with just copying the raw data files and not mysqldump. that should be faster.

as Ori pointed out we have enough temp space on server osmium, another test server and i'll start to copy data over there

the mysql server version before: 5.5.46-0ubuntu0.12.04.2 and after it's going to be jessie with 5.5.46-0ubuntu0.12.04.2,

ubuntu on Jessie? Copy/pasta fail?

the mysql server version before: 5.5.46-0ubuntu0.12.04.2 and after it's going to be jessie with 5.5.46-0ubuntu0.12.04.2,

ubuntu on Jessie? Copy/pasta fail?

yes, edited:) 5.5.46-0+deb8u1

Change 265097 had a related patch set uploaded (by Dzahn):
osmium: temp. add rsyncd to copy ruthenium data

https://gerrit.wikimedia.org/r/265097

Change 265097 merged by Dzahn:
osmium: temp. add rsyncd to copy ruthenium data

https://gerrit.wikimedia.org/r/265097

Dzahn added a comment.EditedJan 19 2016, 8:38 PM

I added an rsyncd on osmium and data is being copied over now.

@ruthenium:/mnt/data# rsync -avz /mnt/data/ rsync://osmium.eqiad.wmnet/ruthenium
sending incremental file list
...

let's see later how the progress is ...

( i have stopped mysql service and puppet before starting)

We have copied about 190G of the 340G .. ongoing ...

Change 265777 had a related patch set uploaded (by Dzahn):
dhcp: let ruthenium use jessie-installer

https://gerrit.wikimedia.org/r/265777

Change 265777 merged by Dzahn:
dhcp: let ruthenium use jessie-installer

https://gerrit.wikimedia.org/r/265777

Dzahn added a comment.EditedJan 22 2016, 8:48 PM
  • the whole /mnt/data has been copied to server osmium.
root@ruthenium:/mnt/data# rsync -avz /mnt/data/ rsync://osmium.eqiad.wmnet/ruthenium
..
sent 72,089,256,641 bytes  received 112,181,182 bytes  777,860.90 bytes/sec
total size is 336,439,244,685  speedup is 4.66
  • changed DHCP config to use jessie installer
  • revoked old puppet cert / salt key, reinstalled with jessie, re-added to puppet/salt, ran puppet,...

it is back up with jessie, nodejs got installed

ii nodejs 4.2.4~dfsg-1~bpo8+1

the user accounts are back.. starting to copy back the data now

you should be able to login again

puppet run finishes but shows the expected errors because upstart "Provider upstart is not functional on this host" . needs to be converted to systemd

Is it possible to get sudo access as well?

You should have the same access as before. I didn't change anything about the access group. They should be applied just like before, parsoid-roots and parsoid-admins to hostname ruthenium in hiera.

The mount point /mnt/data that had all the data we copied was not fully puppetized. I did:

lvrename /dev/ruthenium-vg/_placeholder /dev/ruthenium-vg/tank
mkfs.ext4 /dev/ruthenium-vg/tank
mount /dev/ruthenium-vg/tank /mnt/data

/dev/mapper/ruthenium--vg-tank  870G   72M  826G   1% /mnt/data

Change 265849 had a related patch set uploaded (by Dzahn):
ruthenium: switch rsyncd setup over from osmium

https://gerrit.wikimedia.org/r/265849

Change 265849 merged by Dzahn:
ruthenium: switch rsyncd setup over from osmium

https://gerrit.wikimedia.org/r/265849

data is being copied back from osmium to /mnt/data/ now... and will take a while

i'll close this once that is done as well

i'll close this once that is done as well

I think we should consider this a victory once https://gerrit.wikimedia.org/r/#/c/264032/ and https://gerrit.wikimedia.org/r/#/c/265628/ are merged and confirmed to work as well.

i understand those are related, for purposes of tracking the remaining Ubuntu systems (blocker parent task) this is resolved though, since it's not Ubuntu anymore. maybe we can follow-up with a subtask?

Dzahn added a comment.EditedJan 22 2016, 10:19 PM

ah, thanks! i just made T124480 but that's a duplicate then (or maybe not, i left it)

Dzahn added a comment.EditedJan 25 2016, 7:16 PM

@ssastry all the data has been copied back from osmium. (/mnt/data/ like before) i synced /home/ too and copied old data back from before the upgrade. so that includes some . files and configs in your home @GWicke's data too except the large dump files

i'm removing the rsyncd config and call this one resolved now

Change 266289 had a related patch set uploaded (by Dzahn):
ruthenium: remove rsyncd, was for upgrade only

https://gerrit.wikimedia.org/r/266289

Change 266289 merged by Dzahn:
ruthenium: remove rsyncd, was for upgrade only

https://gerrit.wikimedia.org/r/266289

Dzahn closed this task as Resolved.Jan 25 2016, 7:57 PM

Mentioned in SAL [2016-03-15T16:27:32Z] <mutante> osmium - stopping rsyncd, removing remnants from backup job for ruthenium upgrade T122328

forgot to cleanup on osmium. moritz asked about the rsyncd there.

on osmium: stopped rsyncd, deleted config/init script, deleted /srv/ruthenium data that had been copied back after the upgrade