shutdown sodium after mailman has migrated to jessie VM
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• rtimport
	Jul 3 2013, 10:50 PM

Description

sodium is an old lucid install and needs to be reinstalled with something more modern, like Debian jessie.

Details

Reference: rt5420

Related Objects
Search...

Status	Assigned	Task
Resolved	faidon	T84041 Replace all instances of lighttpd with nginx
Resolved	faidon	T84053 mailman - replace lighttpd
Resolved	Dzahn	T80945 Get rid of all Ubuntu Lucid (10.04) installs
Resolved	Dzahn	T83541 Upgrade Exim to >=4.73
Duplicate	None	T97492 Upgrade to Mailman 3.0
Resolved	Dzahn	T110141 TTL back up to normal 1H
Resolved	Dzahn	T82698 shutdown sodium after mailman has migrated to jessie VM
Resolved	Dzahn	T105756 Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM
Resolved	Dzahn	T108057 Phabricator NDA for John Lewis
		Restricted Task
Resolved	Dzahn	T108073 test importing of mailing list configs and archives on staging VM
Resolved	Dzahn	T108071 export config and archive data from sodium
Resolved	Dzahn	T108080 service IP can't be switched over
Resolved	Dzahn	T108082 give John Lewis shell access on the mailman staging VM
Resolved	Dzahn	T108070 install jessie on new VM for mailman
Resolved	RobH	T108065 eqiad: 1 VM request for mailman
Resolved	Dzahn	T108383 create basic mailman setup on fermium (jessie) for testing import
Resolved	RobH	T109393 rename wikitech-announce.disabled.T100503
Resolved	Dzahn	T109399 go through all directories in /var/lib/mailman and decide if migration is needed
Resolved	Dzahn	T109467 write migration plan for mailman
Resolved	Dzahn	T109539 rename lists mwapi-team.disabled.T97148 and flowfunding.disabled.T97328 ?
Resolved	• JohnLewis	T109624 move mailman server and service IPs to hiera / make it possible to run multiple instances at once
Resolved	Dzahn	T109838 clean up mailman data directory (moderated messages > 0.5 million)
		Restricted Task
Resolved	Dzahn	T109890 reinstall fermium with public IP
Resolved	Dzahn	T109923 add public IP for fermium - DNS and DHCP change for reinstall
Duplicate	Dzahn	T109891 announce mailman downtime
Resolved	Dzahn	T109921 setup rsyncd on fermium to copy files from sodium
Resolved	Dzahn	T109922 write script to import mailing lists from other server
Duplicate	Dzahn	T109924 reinstall fermium with jessie and public IP
Resolved	Dzahn	T109925 apply regular lists role on fermium and confirm no issues
Resolved	Dzahn	T110695 mailman: listinfo template encoding
Resolved	Dzahn	T110129 rsync all configs and archives one more time
Invalid	Dzahn	T110131 import all lists with the script we wrote for that
Resolved	Dzahn	T110132 lower lists.wikimedia.org TTL to 5 min
Resolved	Dzahn	T110133 announce scheduled downtime
Declined	Dzahn	T110135 right before the switch: lower TTL to 10 seconds
Resolved	Dzahn	T110136 hold lists.wikimedia.org with exim
Resolved	Dzahn	T110137 shut down mailman on sodium
Resolved	Dzahn	T110138 rsync the diff since mail was held on sodium
Resolved	Dzahn	T110139 switch over mailman service IP
Resolved	Dzahn	T110140 send follow-up email, announce changes with new mailman version if any that have user impact
Resolved	• JohnLewis	T110382 mailman cronjobs not running?
Resolved	Dzahn	T110440 rsync exim spool directory
Resolved	Dzahn	T110441 test sending individual mails from fermium during migration
Resolved	Dzahn	T112229 fermium needs to have exim4-daemon-heavy installed, not -light
Resolved	Dzahn	T113020 run /var/lib/mailman/bin/update and ./check_perms
Resolved	Dzahn	T113045 start exim on fermium / revert migration hack

Event Timeline

• rtimport raised the priority of this task from to Medium.Dec 18 2014, 1:38 AM

• rtimport added a project: ops-core.

• rtimport set Reference to rt5420.

• rtimport created this task.Jul 3 2013, 10:50 PM

Queue changed from ops-requests to core-ops by thehelpfulone

AdminCc thehelpfulone added by thehelpfulone

AdminCc dzahn@wikimediaorg added by thehelpfulone

Dependency by ticket #2905 added by thehelpfulone

AdminCc thehelpfulone deleted by thehelpfulone

AdminCc jeremyb added by jeremyb

AdminCc dzahn@wikimediaorg deleted by jeremyb

AdminCc dzahn added by jeremyb

AdminCc johnflewis93 added by johnflewis93

AdminCc dzahn deleted by johnflewis93

On Wed Jul 03 23:50:57 2013, thehelpfulone wrote:

Therefore please can we upgrade sodium from lucid to precise, which I
believe
is the current Ubuntu LTS release?

I've been wondering if we should perhaps go with Trusty instead of Precise. Precise has .14 of mailman while Trusty has .15.

Status changed from 'new' to 'open' by RT_System

On Tue, Oct 07, 2014 at 04:36:46PM +0000, John Lewis via RT wrote:

<URL: https://rt.wikimedia.org/Ticket/Display.html?id=5420 >

On Wed Jul 03 23:50:57 2013, thehelpfulone wrote:

Therefore please can we upgrade sodium from lucid to precise, which I
believe
is the current Ubuntu LTS release?

I've been wondering if we should perhaps go with Trusty instead of Precise. Precise has .14 of mailman while Trusty has .15.

The plan has been to go with mailman 2.1.18+, mainly because of better
DMARC support. I commented on the Debian bug tracker about this before
my leave[1] and that has been fixed since.
I've previously attempted a backport of mailman to precise and failed
due to a mess of intertangled dependencies, so indeed, the current plan
is to go with trusty, plus a backport of 2.1.18.
Upgrading to trusty (or precise for that matter) of course is going to
be hard with no downtime, so the plan (as briefly mentioned in RT #7141)
was to set up a new box, then migrate lists over.
Hope this helps,
Faidon
1: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=746592#27

On Tue Oct 07 18:01:48 2014, faidon wrote:

The plan has been to go with mailman 2.1.18+, mainly because of better
DMARC support. I commented on the Debian bug tracker about this before
my leave[1] and that has been fixed since.

I've previously attempted a backport of mailman to precise and failed
due to a mess of intertangled dependencies, so indeed, the current
plan
is to go with trusty, plus a backport of 2.1.18.

That sounds like a valid reason and a good method of implementation to me.

Upgrading to trusty (or precise for that matter) of course is going to
be hard with no downtime, so the plan (as briefly mentioned in RT
#7141)
was to set up a new box, then migrate lists over.

Yeah, upgrading a fairly important and often used service is going to be hard and that sounds like a good plan as well. Mark said the plan is move it over to codfw when it is reasonable stable for production like services after the initial build out if I remember correctly.
As Ryan and Daniel are aware of, as I recently turned up as a volunteer with good knowledge of mailman - I'm offering my help to ops if they need it at any point for mailman and all it takes is a poke for me :)

Cc aklapper added by aklapper

Bugzilla ticket 72072 added by aklapper

faidon renamed this task from Upgrade sodium to precise to Upgrade sodium to jessie.Dec 18 2014, 5:12 PM

faidon claimed this task.

faidon updated the task description. (Show Details)

faidon changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".

faidon changed the edit policy from "WMF-NDA (Project)" to "All Users".

faidon set Security to None.

• MZMcBride subscribed.Jan 5 2015, 1:18 AM

Dzahn mentioned this in T55259: Add Forward Secrecy to all HTTPS sites.Mar 31 2015, 6:23 PM

greg added a parent task: T97492: Upgrade to Mailman 3.0.Apr 28 2015, 11:22 PM

greg subscribed.Apr 28 2015, 11:33 PM

Jalexander mentioned this in T52864: Upgrade GNU Mailman from 2.1 to Mailman3.Apr 29 2015, 12:44 AM

• Gage updated the task description. (Show Details)May 6 2015, 5:49 PM

RobH subscribed.May 14 2015, 5:17 PM

Do we have any specific requirements for a new system for this, other than 'similar to sodium'?

This seems to have stalled out, due to other items taking precedence, but seems the last update suggests this is good to go in Jessie, and we just need a new system allocated for the task?

In T82698#1285294, @RobH wrote:

Do we have any specific requirements for a new system for this, other than 'similar to sodium'?

This seems to have stalled out, due to other items taking precedence, but seems the last update suggests this is good to go in Jessie, and we just need a new system allocated for the task?

Similar to sodium seems suitable for the task from what I can tell.

looks like ~200G used ATM

/dev/mapper/sodium-mailman
                      280G  102G  179G  37% /var/lib/mailman

so from the spares list, this "Dell PowerEdge R420, single Intel Xeon E5-2450 v2 2.50GHz, 16GB Memory, (2) 500GB Disks" would do I think

Why not a (ganeti) VM? In any case, this ticket lacks an owner/assignee. Finding a machine for that is the easy part :)

does it make a difference that it needs a public ip? if it doesn't a VM would be a good fit indeed. very true re: owner, cc @mark

In T82698#1304491, @fgiunchedi wrote:

does it make a difference that it needs a public ip? if it doesn't a VM would be a good fit indeed. very true re: owner, cc @mark

Mailman handles it's own maiil process and exim install on lists so if my understanding is correct, a publicly accessible host would be a requirement for mailman.

Do we want to host mailman archives in a VM? While the process itself isn't that demanding, its just a lot of semi-static (they do require regeneration when posts/content is pulled, and that process could be comparatively demanding) files for web viewing.

@RobH, no the process is not that demanding, neither in CPU cycles or Disk I/O.
For disk I/O, @Dzahn has added an I/O check in icinga that up to now only triggers on bacula backing up the machine, not during normal operations. For CPU cycles, I believe http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&h=sodium.wikimedia.org&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Miscellaneous+eqiad is pretty telling the story of backups being the only source of CPU usage. And even then, it's IO wait.

So then it seems this is an ideal use case for a public IP based ganeti VM, correct? (If so, we can create a request ticket per the instructions on:

https://wikitech.wikimedia.org/wiki/Operations_requests#Virtual_Machine_Requests_.28Production.29

I'm happy to play the test subject for the requestor side, but I wanted to make sure we're at that point.

Robh, yes it does. I 've updated the above link and added https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM in wikitech. Please do request the VM so we can run this process for the very first time. Thanks!

BBlack added a parent task: T90351: Improve SSL of lists.wikimedia.org.Jun 16 2015, 7:44 PM

• MoritzMuehlenhoff added a parent task: Restricted Task.Jul 7 2015, 11:24 AM

Dzahn added a subtask: T105756: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM.Aug 5 2015, 5:22 PM

Restricted Application added a subscriber: Matanya. · View Herald TranscriptAug 5 2015, 5:22 PM

We will setup a new mailman install on jessie on a Ganeti VM instead. After that is done we will shut down sodium. So this will not be an upgrade of sodium itself anymore.

re-naming ticket to reflect that. see progress for new system on T105756

BBlack awarded a token.Aug 5 2015, 5:25 PM

Dzahn renamed this task from Upgrade sodium to jessie to shutdown sodium after mailman has migrated to jessie VM.Aug 5 2015, 5:33 PM

Dzahn removed a parent task: T52864: Upgrade GNU Mailman from 2.1 to Mailman3.

Dzahn removed a parent task: Restricted Task.Aug 5 2015, 5:35 PM

Dzahn removed a parent task: T90351: Improve SSL of lists.wikimedia.org.Aug 5 2015, 5:46 PM

Dzahn removed a parent task: Restricted Task.Aug 5 2015, 5:55 PM

Dzahn mentioned this in T109467: write migration plan for mailman.Aug 22 2015, 12:17 AM

mailman has now been migrated to fermium, so sodium is not actively used anymore. we will just keep it around for a few more days just in case there are any issues that we notice later

Dzahn added a parent task: T110141: TTL back up to normal 1H.Sep 18 2015, 10:04 PM

Dzahn closed subtask T105756: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM as Resolved.Sep 18 2015, 10:06 PM

removing from puppet today but not shutdown yet

Dzahn claimed this task.Sep 21 2015, 11:34 PM

root@sodium:/backup# shutdown -h now
W: molly-guard: SSH session detected!
Please type in hostname of the machine to shutdown: sodium

Broadcast message from dzahn@sodium
(/dev/pts/0) at 22:15 ...

The system is going down for halt NOW!

Shutting it down without wiping it is really dangerous — it means that it could come back up at any point due to e.g. power flapping, take its old IP and -at best- start communicating with the network without us even noticing (since we have no Icinga checks for it anymore!) and having it under configuration management control (e.g. revoked SSH accounts). At worst, we could have reassigned the IP in the meantime and this booting up could create an outage for whichever new box has taken its IP.

I believe we have a decom procedure somewhere. Let's follow it properly and wipe this box.

Ricordisamoa subscribed.Sep 27 2015, 12:44 PM

In T82698#1679052, @faidon wrote:

I believe we have a decom procedure somewhere. Let's follow it properly and wipe this box.

Totally, it's just another ticket, T110142 (it says "this ticket should be complete after DNS removal, disk wiping and taking hardware out of rack or reclaim" because of this process)

Dzahn closed this task as Resolved.Sep 28 2015, 7:44 PM

shutdown sodium after mailman has migrated to jessie VMClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

shutdown sodium after mailman has migrated to jessie VM
Closed, ResolvedPublic
Actions

Related Objects
Search...