Figure out a way to sync old and new mailman
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Ladsgroup
	Jun 27 2020, 4:10 PM

Description

Mailman3 has a script to migrate the old archives to the new system but there's no way to do it the other way around. Let's find a way to keep the old and new structure the same.

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved	Security	None	T181803 Stop storing Mailman passwords in plain text
Resolved		None	T118641 Implement proper AAA for lists.wikimedia.org (mailman)
Resolved		None	T190054 List archives on lists.wikimedia.org is not mobile friendly
Resolved		None	T115329 "From" at start of line becomes ">From" in pipermail
Resolved		None	T52864 Upgrade GNU Mailman from 2.1 to Mailman3
Resolved		None	T256539 Figure out a way to sync old and new mailman

Event Timeline

Ladsgroup created this task.Jun 27 2020, 4:10 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 27 2020, 4:10 PM

jcrespo moved this task from Backlog to Mailman v3 on the Wikimedia-Mailing-lists board.Jul 8 2020, 10:05 AM

herron triaged this task as Low priority.Jul 27 2020, 8:04 PM

Mail from @Ladsgroup to list admins:

Mailman allows us to upgrade mailing list by mailing list, that's good but we haven't found a way to keep the old version and the new ones in sync (archives, etc.). Maybe we migrate a mailing list and the archives for the old version will stop getting updated. Would that work for you? Feel free to chime in: https://phabricator.wikimedia.org/T256539

As an amateurish list admin, I chime in saying: surely it would. It's probably easier to put a link on the "new" archives saying "For archives prior to Day X, please see <here>" than synching everything forever?

To be clear, we can import all of the legacy mailman 2 archives into hyperkitty, right?

If so, I have a rough idea on how to set up a redirector from mailman2 URLs to hyperkitty. Each email has a unique Message-Id (though if a message was cross-posted, it might not be globally unique), which we can use to match the URLs. I'm currently scraping the pipermail archives to make a mapping of URL to Message-Id. Then I'd figure out how to parse the hyperkitty archives and set up a redirector.

In T256539#6428881, @Legoktm wrote:

To be clear, we can import all of the legacy mailman 2 archives into hyperkitty, right?

According to the documentation of mailman it should be possible: https://docs.mailman3.org/en/latest/migration.html#upgrade-strategy

If so, I have a rough idea on how to set up a redirector from mailman2 URLs to hyperkitty. Each email has a unique Message-Id (though if a message was cross-posted, it might not be globally unique), which we can use to match the URLs. I'm currently scraping the pipermail archives to make a mapping of URL to Message-Id. Then I'd figure out how to parse the hyperkitty archives and set up a redirector.

The problem is that URLs in mails in hypetkitty are using some hash (I don't know what type of hash tbh). Here's an example https://lists.wmcloud.org/hyperkitty/list/test@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/

In T256539#6428881, @Legoktm wrote:

Each email has a unique Message-Id (though if a message was cross-posted, it might not be globally unique), which we can use to match the URLs. I'm currently scraping the pipermail archives to make a mapping of URL to Message-Id.

My assumption that every email has a Message-Id might not be true. As of last week, there are 1.02M archived mails in pipermail, and 61k don't have a Message-Id. Some of them look to be where people have messed with the archives (redacting something I guess) but others I couldn't figure out a reason. Maybe they're in the mbox archive? Someone with shell access would need to examine those I think. I can upload the 123M .csv file I have if people want to take a look.

(Sidenote: some lists have way too big archives: T262773: Stop archiving the wikidata-bugs mailinglist in pipermail)

In T256539#6456323, @Ladsgroup wrote:

In T256539#6428881, @Legoktm wrote:

To be clear, we can import all of the legacy mailman 2 archives into hyperkitty, right?

According to the documentation of mailman it should be possible: https://docs.mailman3.org/en/latest/migration.html#upgrade-strategy

Can we try importing a list? I'm worried about "The one defect that will definitely cause problems is lines beginning with From in message bodies." which we've been tracking as T115329: "From" at start of line becomes ">From" in pipermail.

The problem is that URLs in mails in hypetkitty are using some hash (I don't know what type of hash tbh). Here's an example https://lists.wmcloud.org/hyperkitty/list/test@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/

I think the API https://lists.wmcloud.org/hyperkitty/api/ should be good enough for what I'm trying to do, but it would be nice if we had a imported list to test with.

Coming back to this, I think we should:

Get the Internet Archive to scrape all the current pipermail archives/views as a fail-safe (this is currently prevented with https://lists.wikimedia.org/robots.txt)
Finish generating the mapping of pipermail URL -> message-id so we can set up a redirector to hyperkitty archives
Import the lists into mailman3, see if we're satisfied with the new archives + redirects (need to determine a criteria of "satisfied")
- If not, create a lists2-static.wikimedia.org archive with the final HTML archives and redirect lists.wikimedia/pipermail/* to it.

There is the archive aspect of upgrade, there's also the double support aspect of the upgrade that bothers me a lot and couldn't come up with a good solution for yet. Imagine we want to have migration period that we have mailman2 and 3 at the same time in production and slowly upgrade one mailing list after the other. How that should look like?

Should both live in lists.wikimedia.org?
- Meaning two roles in mailman1001, exim handling the routing to different mailmans which will be quite fun and so many other complexities.
Should migrated mailing lists live in lists3.wikimedia.org or lists-next.wikimedia.org and then moved back later.
- How that's possible? Wouldn't it hurt the archives since domains are changed after migration?
Maybe come up with a cool but different name like "lists-new.wikimedia.org" and after migration leave it like that?
Mailman3 can handle different domains than the email domain.
- So keep the web in lists-next.wikimedia.org (and ultimately change it back) and the mails being sent to lists.wikimedia.org
  - How do we deliver mail from the old one to this one?

python.org and some other FLOSS orgs has done an upgrade. We should take a look how it's done in those places.

You did skip over the easy option - declare a downtime for X hours, migrate everything over, and then bring it all back up on mailman3. Implementation wise it's the easiest but requires us to be very confident in our testing that stuff won't go wrong and when it inevitably does, be ready to immediately fix stuff. I don't think we're going to reach that level of confidence though.

Note that I haven't thought about how to go about implementing this yet.

In T256539#6902763, @Ladsgroup wrote:

There is the archive aspect of upgrade, there's also the double support aspect of the upgrade that bothers me a lot and couldn't come up with a good solution for yet. Imagine we want to have migration period that we have mailman2 and 3 at the same time in production and slowly upgrade one mailing list after the other. How that should look like?

In my head the upgrade happens in 3ish stages:

Opt-in test lists where the admins volunteer to go first
Small/medium lists
Large, modern lists
- 3.5. Large, very old lists (wikipedia-l, wikitech-l, etc. Anything with content before, say, 2010? Not sure where the cutoff should be)

It would be nice if we could do the archive imports ahead of time, that would help us track down issues, especially with the legacy lists. Actually that's my main worry right now, the exim stuff seems rather straightforward.

Should both live in lists.wikimedia.org?

During the migration period, sending emails to <name>@lists.wikimedia.org should work, regardless of which mailman is running the list. Ideally users would also get email via lists.wikimedia.org so mail filters, etc. will all continue to work
In the end, mailman3 should be on lists.wikimedia.org.

Meaning two roles in mailman1001, exim handling the routing to different mailmans which will be quite fun and so many other complexities.

I guess we'd need to have a list in puppet/hiera for those that are migrated to route them differently.

Should migrated mailing lists live in lists3.wikimedia.org or lists-next.wikimedia.org and then moved back later.

...if we have to. The URL paths should be different right?

So in theory we could route ^/(pipermail|mailman)/ to mailman2 and ^/(archives|admin)/ to mailman3.

Now that I've typed it out, I don't think it would be too bad if we can figure out the exim config. We would just install the mailman3 stuff on the existing lists1001 VM, have a list of all the non-migrated lists in puppet, pass that list/regex to exim and apache to route mails and web requests accordingly. As we migrate lists, we remove them from the puppet/hiera list and exim/apache will reroute to mailman3 accordingly. (Please, poke holes in this)

As a bonus, we keep the same IP, which means it doesn't lose reputation in all the spam filtering things out there.

Maybe come up with a cool but different name like "lists-new.wikimedia.org" and after migration leave it like that?

-1

python.org and some other FLOSS orgs has done an upgrade. We should take a look how it's done in those places.

Definitely. I know of https://fedoraproject.org/wiki/Mailman3_Migration as well.

You don't need a list of mailing lists in puppet, the exim4 routing checks for existence of directories that exist under mailman3 for that specific mailing list, otherwise it doesn't route them.

Ladsgroup mentioned this in T278495: Figure out plan for mailman IP situation.Mar 26 2021, 6:11 PM

Ladsgroup mentioned this in T278610: Install mailman3 on lists1001.wikimedia.org.Mar 27 2021, 10:41 AM

RhinosF1 subscribed.Mar 27 2021, 10:51 AM

I think there's a clear picture on what to do next now. I call this resolved and already created tickets for installing mailman3 on lists1001.wikimedia.org T278610: Install mailman3 on lists1001.wikimedia.org

Legoktm mentioned this in T52864: Upgrade GNU Mailman from 2.1 to Mailman3.Apr 3 2021, 10:18 PM

Legoktm mentioned this in T278905: Reconsider which mailman3 version we're running.Apr 6 2021, 6:12 AM

Legoktm mentioned this in T280731: Implement static redirects from pipermail archives to hyperkitty archives.Apr 20 2021, 8:08 PM

Figure out a way to sync old and new mailmanClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Figure out a way to sync old and new mailman
Closed, ResolvedPublic
Actions

Related Objects
Search...