sodium is an old lucid install and needs to be reinstalled with something more modern, like Debian jessie.
Description
Details
- Reference
- rt5420
Event Timeline
On Wed Jul 03 23:50:57 2013, thehelpfulone wrote:
Therefore please can we upgrade sodium from lucid to precise, which I
believe
is the current Ubuntu LTS release?
I've been wondering if we should perhaps go with Trusty instead of Precise. Precise has .14 of mailman while Trusty has .15.
On Tue, Oct 07, 2014 at 04:36:46PM +0000, John Lewis via RT wrote:
<URL: https://rt.wikimedia.org/Ticket/Display.html?id=5420 >
On Wed Jul 03 23:50:57 2013, thehelpfulone wrote:
Therefore please can we upgrade sodium from lucid to precise, which I
believe
is the current Ubuntu LTS release?I've been wondering if we should perhaps go with Trusty instead of Precise. Precise has .14 of mailman while Trusty has .15.
The plan has been to go with mailman 2.1.18+, mainly because of better
DMARC support. I commented on the Debian bug tracker about this before
my leave[1] and that has been fixed since.
I've previously attempted a backport of mailman to precise and failed
due to a mess of intertangled dependencies, so indeed, the current plan
is to go with trusty, plus a backport of 2.1.18.
Upgrading to trusty (or precise for that matter) of course is going to
be hard with no downtime, so the plan (as briefly mentioned in RT #7141)
was to set up a new box, then migrate lists over.
Hope this helps,
Faidon
1: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=746592#27
On Tue Oct 07 18:01:48 2014, faidon wrote:
The plan has been to go with mailman 2.1.18+, mainly because of better
DMARC support. I commented on the Debian bug tracker about this before
my leave[1] and that has been fixed since.I've previously attempted a backport of mailman to precise and failed
due to a mess of intertangled dependencies, so indeed, the current
plan
is to go with trusty, plus a backport of 2.1.18.
That sounds like a valid reason and a good method of implementation to me.
Upgrading to trusty (or precise for that matter) of course is going to
be hard with no downtime, so the plan (as briefly mentioned in RT
#7141)
was to set up a new box, then migrate lists over.
Yeah, upgrading a fairly important and often used service is going to be hard and that sounds like a good plan as well. Mark said the plan is move it over to codfw when it is reasonable stable for production like services after the initial build out if I remember correctly.
As Ryan and Daniel are aware of, as I recently turned up as a volunteer with good knowledge of mailman - I'm offering my help to ops if they need it at any point for mailman and all it takes is a poke for me :)
Do we have any specific requirements for a new system for this, other than 'similar to sodium'?
This seems to have stalled out, due to other items taking precedence, but seems the last update suggests this is good to go in Jessie, and we just need a new system allocated for the task?
looks like ~200G used ATM
/dev/mapper/sodium-mailman 280G 102G 179G 37% /var/lib/mailman
so from the spares list, this "Dell PowerEdge R420, single Intel Xeon E5-2450 v2 2.50GHz, 16GB Memory, (2) 500GB Disks" would do I think
Why not a (ganeti) VM? In any case, this ticket lacks an owner/assignee. Finding a machine for that is the easy part :)
does it make a difference that it needs a public ip? if it doesn't a VM would be a good fit indeed. very true re: owner, cc @mark
Mailman handles it's own maiil process and exim install on lists so if my understanding is correct, a publicly accessible host would be a requirement for mailman.
Do we want to host mailman archives in a VM? While the process itself isn't that demanding, its just a lot of semi-static (they do require regeneration when posts/content is pulled, and that process could be comparatively demanding) files for web viewing.
@RobH, no the process is not that demanding, neither in CPU cycles or Disk I/O.
For disk I/O, @Dzahn has added an I/O check in icinga that up to now only triggers on bacula backing up the machine, not during normal operations. For CPU cycles, I believe http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&h=sodium.wikimedia.org&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Miscellaneous+eqiad is pretty telling the story of backups being the only source of CPU usage. And even then, it's IO wait.
So then it seems this is an ideal use case for a public IP based ganeti VM, correct? (If so, we can create a request ticket per the instructions on:
https://wikitech.wikimedia.org/wiki/Operations_requests#Virtual_Machine_Requests_.28Production.29
I'm happy to play the test subject for the requestor side, but I wanted to make sure we're at that point.
Robh, yes it does. I 've updated the above link and added https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM in wikitech. Please do request the VM so we can run this process for the very first time. Thanks!
We will setup a new mailman install on jessie on a Ganeti VM instead. After that is done we will shut down sodium. So this will not be an upgrade of sodium itself anymore.
re-naming ticket to reflect that. see progress for new system on T105756
mailman has now been migrated to fermium, so sodium is not actively used anymore. we will just keep it around for a few more days just in case there are any issues that we notice later
root@sodium:/backup# shutdown -h now
W: molly-guard: SSH session detected!
Please type in hostname of the machine to shutdown: sodium
Broadcast message from dzahn@sodium
(/dev/pts/0) at 22:15 ...
The system is going down for halt NOW!
Shutting it down without wiping it is really dangerous — it means that it could come back up at any point due to e.g. power flapping, take its old IP and -at best- start communicating with the network without us even noticing (since we have no Icinga checks for it anymore!) and having it under configuration management control (e.g. revoked SSH accounts). At worst, we could have reassigned the IP in the meantime and this booting up could create an outage for whichever new box has taken its IP.
I believe we have a decom procedure somewhere. Let's follow it properly and wipe this box.
Totally, it's just another ticket, T110142 (it says "this ticket should be complete after DNS removal, disk wiping and taking hardware out of rack or reclaim" because of this process)