Page MenuHomePhabricator

cleanup mailman archives - introduce apache rewrites
Closed, ResolvedPublic

Description

We presently have directions outlining that when a mailing list is renamed, we put in apache redirection pages for the listinfo page, and we migrate the archives, but leave a copy of the old archives in place. This makes sense, in that the old archives are linked to outside of the mailman software; however there is no reason to do this via duplicate archives when an apache redirect would suffice.

As such, I'd like to append archive redirections to all the listinfo redirections, after confirming each one has full archive content on the new list as well as the old list archives. As long as the new archives are inclusive, I'll append a new rewrite to the new archives and confirm it is working. Once they are confirmed working, I'll remove all the old list archives.

This will de-duplicate list archive data considerably, allowing for the ongoing migration to have less data to crunch. (Plus why keep duplicate data anyhow?)

In a quick test of the old foundation-l to new wikimedia-l shows a few odd items, that I think are attributed to list rebuilds and recompression over time:

https://lists.wikimedia.org/pipermail/wikimedia-l/
https://lists.wikimedia.org/pipermail/wikimedia-l/2004-April/thread.html

https://lists.wikimedia.org/pipermail/foundation-l/
https://lists.wikimedia.org/pipermail/foundation-l/2004-April/thread.html

So the actual number of messages/subject/author have remained unchanged, but he overall threading indentation and overall archive size per month has. I've attributed this to the fact the active list has more recently had arch rebuilds.

Event Timeline

RobH claimed this task.
RobH raised the priority of this task from to Medium.
RobH updated the task description. (Show Details)
RobH added subscribers: RobH, Dzahn, JohnLewis.

I'll note we'll have to change the documentation to match this, but @JohnLewis has already undertaken some rewrites (as well as the rest of us contributing.)

This will result in some list archive renumbering, which would happen eventually no matter what. When these are all imported in the future, we'll have to account for them all, and they may need rebuild.

So delaying for that reason alone (which is the only reason I can think of to delay this) doesn't seem enough.

I've livehacked a test into place on sodium for redirecting all archive traffic for the old foundation-l to wikimedia-l and they work. Additionally, the direct links rewrite to new ones. For the earlier months, they were one to one. (This may change later in the archives if something was intentionally scrubbed.)

As such, my next task will be to track down not only the existing redirected lists (which is easy, since we arleady redirect their listinfo pages), but potentially audit mailman archives as a whole and ensure all archives have a matching listinfo destination.

Also, shame on me for livehacking and not using labs. I have been called out (quite correctly. ;)

I did this for wikidata-l -> wikidata with https://gerrit.wikimedia.org/r/#/c/304055/

It was separately requested in T136798

RobH removed RobH as the assignee of this task.Jun 1 2017, 5:16 PM
RobH claimed this task.

I no longer thing doing this across the board is a good idea. Sometimes they are intentionally linked to the old archives, so they won't ever be re-indexed and break. Sometimes its a great thing, and makes things far more clear.

Either way, it is likely a case by case basis. When applicable, it makes things look cleaner, and allows the old archives to be removed entirely.