On 2022-09-16 between 06:30 and 14:30, the jjmc89-bot tool had multiple pod failures according to Grafana; however, no emails were received. All jobs have emails: onfailure set.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | BUG REPORT | aborrero | T317998 toolforge-jobs emails not working | ||
Resolved | aborrero | T320405 toolforge jobs-framework-emailer: increase reliability |
Event Timeline
There were more on 2022-09-28 between 11:39 and 12:20 that I did not receive emails for. (Grafana)
None of the emails are working per a report on IRC.
<PeterBowman> hello, are email notifications in toolforge-jobs actually working? I can successfully send messages to myself via tools.tool-name@tools.wmflabs.org, which I think is the address being used by the framework under the hood, but the --emails option has no effect for me
Here are my Grafana logs. I didn't receive any messages neither on successful nor failing jobs, regardless of the selected value for --emails (onfailure/onfinish/all).
There is something wrong with the emailer daemon:
[..] 2022-08-20 05:00:33 INFO: 1 new pending emails in the queue, new total queue size: 1 2022-08-20 05:00:40 INFO: Sending email FROM: noreply@toolforge.org TO: tools.arkivbot@tools.wmflabs.org via mail.tools.wmflabs.org:25 2022-08-20 05:07:17 INFO: 1 new pending emails in the queue, new total queue size: 1 2022-08-20 05:07:17 INFO: Sending email FROM: noreply@toolforge.org TO: tools.arkivbot@tools.wmflabs.org via mail.tools.wmflabs.org:25 2022-08-20 06:43:01 INFO: 1 new pending emails in the queue, new total queue size: 1 2022-08-20 06:43:01 INFO: Sending email FROM: noreply@toolforge.org TO: tools.earwigbot@tools.wmflabs.org via mail.tools.wmflabs.org:25
It is indeed apparently not sending emails since a couple months ago.
Mentioned in SAL (#wikimedia-cloud) [2022-10-10T11:35:36Z] <arturo> aborrero@tools-k8s-control-1:~$ sudo -i kubectl -n jobs-emailer rollout restart deployment/jobs-emailer (T317998)
The emailer component seems to be happy again. Please reopen if required. We will track improvements on subtask T320405: toolforge jobs-framework-emailer: increase reliability.