Page MenuHomePhabricator

Enourmous mailman3 outgoing queue
Closed, ResolvedPublic

Description

Screenshot_2021-05-30 Mailman3 - Grafana.png (955×1 px, 69 KB)

We got a bunch of bounces, those users were unsubscribed (on the order of like 7k!) and then for each one, mailman3 sends an email to the list owner saying the address was unsubscribed and then to the address itself, which probably results in another bounce. There's also some issue when a list owner is bouncing.

The outgoing queue is also processing very slowly, I think there's another bad/unoptimized query here, it's sending 1 email every 5 seconds...at this rate it'll take 70 hours to empty the queue.

We should just delete all the "You have been unsubscribed" emails out of the queue. I don't know if those can just be deleted off of disk or if they need to be removed from the database too. maybe there's a REST API method for it.

Event Timeline

Legoktm triaged this task as High priority.May 31 2021, 6:58 AM
Legoktm created this task.

Mentioned in SAL (#wikimedia-operations) [2021-05-31T07:23:00Z] <legoktm> deleting all outgoing list mail that has a subject that starts with "You have been unsubscribed from the" T284003

Mentioned in SAL (#wikimedia-operations) [2021-05-31T07:30:45Z] <legoktm> deleted all outoing list mail that is for a yahoo/aol address being unsubscribed T284003

Mentioned in SAL (#wikimedia-operations) [2021-05-31T07:32:04Z] <legoktm> deleted all outoing list mail that is for a gmail address being unsubscribed T284003

I deleted all messages in the outgoing queue that matched:

should_delete = subject.startswith('You have been unsubscribed from the') or \
    '@yahoo.com> unsubscribed from' in subject or \
    '@yahoo.com unsubscribed from' in subject or \
    '@gmail.com> unsubscribed from' in subject or \
    '@gmail.com unsubscribed from' in subject or \
    '@aol.com> unsubscribed from' in subject or \
    '@aol.com unsubscribed from' in subject

which brought us down from 25k to 5k messages. The bounce runner crashed again, so it won't add any more messages into the outgoing queue, just normal mail traffic will. I think it's OK to let the bounce queue to grow large until we figure out what to do with it...tomorrow.

Legoktm claimed this task.

So we were running into an exception T282348#7124014, and when the bounce runner crashed, it rolled back the transaction and un-unsubscribed all the users (but obviously couldn't rollback the email notifications). Then I'd restart the bounce runner and it would reprocess everything, send out more emails, but actually make no progress.

The crash is hotpatched now and the out queue has mostly recovered (under 1k) and should be 0 in an hour or so.