Page MenuHomePhabricator

Lengthy delays in emails being received from mailing lists in October 2019
Closed, ResolvedPublicBUG REPORT

Description

I noticed yesterday a delay between when emails are being sent to Wikimedia-l and when they are arriving in my inbox. Today the email https://lists.wikimedia.org/pipermail/wikimedia-l/2019-October/093736.html was sent at Sun Oct 20 06:25:45 UTC 2019 but it apparently arrived in my inbox much later. Google says "Delivered after 9538 seconds". The email https://lists.wikimedia.org/pipermail/wikimedia-l/2019-October/093738.html appears to have been sent at Sun Oct 20 15:14:16 UTC 2019 but as of the time of me submitting this task on Phabricator the email is not in my inbox. On a different mailing list, https://lists.wikimedia.org/pipermail/wikimedia-cascadia/2019-October/001258.html was "Delivered after 16869 seconds" and https://lists.wikimedia.org/pipermail/wikimedia-cascadia/2019-October/001259.html has not arrived in my inbox.

Emails several days ago also have had significant delays, though less bad. https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092679.html arrived with a delay of 1014 seconds, and https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092660.html arrived after a delay of 1603 seconds.

Earlier emails were delivered much faster. https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092606.html had a delay of 23 seconds, and https://lists.wikimedia.org/pipermail/wikidata/2019-September/013562.html had a delay of 63 seconds.

I think that a delay of 1014 seconds is too long, and 16869 seconds is far too long. Is this problem on Google's end or on WMF's end?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Pine changed the subtype of this task from "Task" to "Bug Report".Oct 20 2019, 9:13 PM

I checked the delays for a few emails that I get from non-WMF mailing lists, and the maximum delay that I saw was under 4 minutes, so I think that there is likely a bug on WMF's end and I'm changing the task subtype for this Phab task to "Bug Report".

I've been monitoring this the past couple days. Since yesterday we've gone from over 20k messages in the queue to less than 6k. The backlog seems to be coming from a particular provider's ratelimiting. Samples from my inbox indicate delays between when Google relays the message to the list, and delays between the list server and the outbound mail relay. The combined effect makes the delay metric you're seeing.

A large backlog might exacerbate the delays you're seeing but it is not clear if that is the cause. I will continue to monitor the situation.

Confirming that this is a serious issue. Two time-sensitive enwiki mailing lists, Oversight-L and Functionaries-en-L, do not seem to be sending out emails.

EDIT: We have confirmed that messages are being added to the Functionaries-en-L archive, even though there is no indication that they have been sent, and nobody has received them. (Oversight-L is a non-archiving list so we cannot confirm that this is occurring there as well.)

Mentioned in SAL (#wikimedia-operations) [2019-10-24T03:55:57Z] <shdubsh> temporarily turn down accept delay on fermium - T235983

Pine raised the priority of this task from High to Unbreak Now!.Oct 24 2019, 3:24 PM

Does this bug affect emergency@ or legal@? In any case if this is delaying emails to oversighters then I think that UBN priority is appropriate, and I am raising the priority accordingly.

Aklapper renamed this task from Lengthy delays in emails being recieved from mailing lists to Lengthy delays in emails being received from mailing lists in October 2019.Oct 24 2019, 3:25 PM

This bug is also affecting the https://lists.wikimedia.org/mailman/listinfo/traffic-anomaly-report list though it is not a priority as compared to the other lists mentioned above. (Sharing in case it is helpful for debugging.)

This issue is mitigated as of this UTC morning and confirm I am no longer seeing long delays of list email.

This bug is also affecting the https://lists.wikimedia.org/mailman/listinfo/traffic-anomaly-report list though it is not a priority as compared to the other lists mentioned above. (Sharing in case it is helpful for debugging.)

I can confirm this is resolved as well. Thank you.