Page MenuHomePhabricator

shutdown sodium after mailman has migrated to jessie VM
Closed, ResolvedPublic

Description

sodium is an old lucid install and needs to be reinstalled with something more modern, like Debian jessie.

Details

Reference
rt5420

Related Objects

StatusSubtypeAssignedTask
Resolvedfaidon
Resolvedfaidon
ResolvedDzahn
ResolvedDzahn
DuplicateNone
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedRobH
ResolvedDzahn
ResolvedRobH
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
Resolved JohnLewis
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
DuplicateDzahn
ResolvedDzahn
ResolvedDzahn
DuplicateDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
InvalidDzahn
ResolvedDzahn
ResolvedDzahn
DeclinedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
Resolved JohnLewis
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn

Event Timeline

rtimport raised the priority of this task from to Medium.Dec 18 2014, 1:38 AM
rtimport added a project: ops-core.
rtimport set Reference to rt5420.

Queue changed from ops-requests to core-ops by thehelpfulone

AdminCc thehelpfulone added by thehelpfulone

AdminCc dzahn@wikimediaorg added by thehelpfulone

Dependency by ticket #2905 added by thehelpfulone

AdminCc thehelpfulone deleted by thehelpfulone

AdminCc jeremyb added by jeremyb

AdminCc dzahn@wikimediaorg deleted by jeremyb

AdminCc dzahn added by jeremyb

AdminCc johnflewis93 added by johnflewis93

AdminCc dzahn deleted by johnflewis93

On Wed Jul 03 23:50:57 2013, thehelpfulone wrote:

Therefore please can we upgrade sodium from lucid to precise, which I
believe
is the current Ubuntu LTS release?

I've been wondering if we should perhaps go with Trusty instead of Precise. Precise has .14 of mailman while Trusty has .15.

Status changed from 'new' to 'open' by RT_System

On Tue, Oct 07, 2014 at 04:36:46PM +0000, John Lewis via RT wrote:

<URL: https://rt.wikimedia.org/Ticket/Display.html?id=5420 >

On Wed Jul 03 23:50:57 2013, thehelpfulone wrote:

Therefore please can we upgrade sodium from lucid to precise, which I
believe
is the current Ubuntu LTS release?

I've been wondering if we should perhaps go with Trusty instead of Precise. Precise has .14 of mailman while Trusty has .15.

The plan has been to go with mailman 2.1.18+, mainly because of better
DMARC support. I commented on the Debian bug tracker about this before
my leave[1] and that has been fixed since.
I've previously attempted a backport of mailman to precise and failed
due to a mess of intertangled dependencies, so indeed, the current plan
is to go with trusty, plus a backport of 2.1.18.
Upgrading to trusty (or precise for that matter) of course is going to
be hard with no downtime, so the plan (as briefly mentioned in RT #7141)
was to set up a new box, then migrate lists over.
Hope this helps,
Faidon
1: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=746592#27

On Tue Oct 07 18:01:48 2014, faidon wrote:

The plan has been to go with mailman 2.1.18+, mainly because of better
DMARC support. I commented on the Debian bug tracker about this before
my leave[1] and that has been fixed since.

I've previously attempted a backport of mailman to precise and failed
due to a mess of intertangled dependencies, so indeed, the current
plan
is to go with trusty, plus a backport of 2.1.18.

That sounds like a valid reason and a good method of implementation to me.

Upgrading to trusty (or precise for that matter) of course is going to
be hard with no downtime, so the plan (as briefly mentioned in RT
#7141)
was to set up a new box, then migrate lists over.

Yeah, upgrading a fairly important and often used service is going to be hard and that sounds like a good plan as well. Mark said the plan is move it over to codfw when it is reasonable stable for production like services after the initial build out if I remember correctly.
As Ryan and Daniel are aware of, as I recently turned up as a volunteer with good knowledge of mailman - I'm offering my help to ops if they need it at any point for mailman and all it takes is a poke for me :)

Bugzilla ticket 72072 added by aklapper

faidon renamed this task from Upgrade sodium to precise to Upgrade sodium to jessie.Dec 18 2014, 5:12 PM
faidon claimed this task.
faidon updated the task description. (Show Details)
faidon changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".
faidon changed the edit policy from "WMF-NDA (Project)" to "All Users".
faidon set Security to None.

Do we have any specific requirements for a new system for this, other than 'similar to sodium'?

This seems to have stalled out, due to other items taking precedence, but seems the last update suggests this is good to go in Jessie, and we just need a new system allocated for the task?

Do we have any specific requirements for a new system for this, other than 'similar to sodium'?

This seems to have stalled out, due to other items taking precedence, but seems the last update suggests this is good to go in Jessie, and we just need a new system allocated for the task?

Similar to sodium seems suitable for the task from what I can tell.

looks like ~200G used ATM

/dev/mapper/sodium-mailman
                      280G  102G  179G  37% /var/lib/mailman

so from the spares list, this "Dell PowerEdge R420, single Intel Xeon E5-2450 v2 2.50GHz, 16GB Memory, (2) 500GB Disks" would do I think

Why not a (ganeti) VM? In any case, this ticket lacks an owner/assignee. Finding a machine for that is the easy part :)

does it make a difference that it needs a public ip? if it doesn't a VM would be a good fit indeed. very true re: owner, cc @mark

does it make a difference that it needs a public ip? if it doesn't a VM would be a good fit indeed. very true re: owner, cc @mark

Mailman handles it's own maiil process and exim install on lists so if my understanding is correct, a publicly accessible host would be a requirement for mailman.

Do we want to host mailman archives in a VM? While the process itself isn't that demanding, its just a lot of semi-static (they do require regeneration when posts/content is pulled, and that process could be comparatively demanding) files for web viewing.

@RobH, no the process is not that demanding, neither in CPU cycles or Disk I/O.
For disk I/O, @Dzahn has added an I/O check in icinga that up to now only triggers on bacula backing up the machine, not during normal operations. For CPU cycles, I believe http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&h=sodium.wikimedia.org&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Miscellaneous+eqiad is pretty telling the story of backups being the only source of CPU usage. And even then, it's IO wait.

So then it seems this is an ideal use case for a public IP based ganeti VM, correct? (If so, we can create a request ticket per the instructions on:

https://wikitech.wikimedia.org/wiki/Operations_requests#Virtual_Machine_Requests_.28Production.29

I'm happy to play the test subject for the requestor side, but I wanted to make sure we're at that point.

Robh, yes it does. I 've updated the above link and added https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM in wikitech. Please do request the VM so we can run this process for the very first time. Thanks!

We will setup a new mailman install on jessie on a Ganeti VM instead. After that is done we will shut down sodium. So this will not be an upgrade of sodium itself anymore.

re-naming ticket to reflect that. see progress for new system on T105756

Dzahn renamed this task from Upgrade sodium to jessie to shutdown sodium after mailman has migrated to jessie VM.Aug 5 2015, 5:33 PM
Dzahn removed a parent task: Restricted Task.Aug 5 2015, 5:35 PM
Dzahn removed a parent task: Restricted Task.Aug 5 2015, 5:55 PM

mailman has now been migrated to fermium, so sodium is not actively used anymore. we will just keep it around for a few more days just in case there are any issues that we notice later

removing from puppet today but not shutdown yet

root@sodium:/backup# shutdown -h now
W: molly-guard: SSH session detected!
Please type in hostname of the machine to shutdown: sodium

Broadcast message from dzahn@sodium
(/dev/pts/0) at 22:15 ...

The system is going down for halt NOW!

Shutting it down without wiping it is really dangerous — it means that it could come back up at any point due to e.g. power flapping, take its old IP and -at best- start communicating with the network without us even noticing (since we have no Icinga checks for it anymore!) and having it under configuration management control (e.g. revoked SSH accounts). At worst, we could have reassigned the IP in the meantime and this booting up could create an outage for whichever new box has taken its IP.

I believe we have a decom procedure somewhere. Let's follow it properly and wipe this box.

I believe we have a decom procedure somewhere. Let's follow it properly and wipe this box.

Totally, it's just another ticket, T110142 (it says "this ticket should be complete after DNS removal, disk wiping and taking hardware out of rack or reclaim" because of this process)