I have not received any VRT emails in the past two days, despite multiple tickets coming in that should have triggered email notifications (last email I have is from Feb 16th at 1:13 PM Eastern). I have anecdotal reports of the same from a couple other users in the VRT IRC channel. Znuny itself seems to be receiving tickets as expected, it's just email notifications that aren't working.
Description
Event Timeline
Confirming that this has been identified by myself and several other users of oversight-en and checkuser-en queues (at minimum). Concur with the timeline identified above. There is currently an enwiki functionaries thread about this.
We think this is a bigger issue, seem like no user can get agent's reply from VRT (checked at info-zh@wikimedia.org)
Did a quick test and seems sender could not receive auto-reply and agent-responded email. Not sure if is relevant to notification mail.
I noticed some db errors in the logs, after manually checking the db config was correct, I have restarted otrs-daemon.service and things look healthy again. Can anyone confirm if the issues is fixed?
I do see the following error in the log but this looks like it can wait for someone more knowledgable about otrs/zunny then i to check into
Feb 20 15:01:28 otrs1001 OTRS-CGI-10[21026]: [Error][Kernel::System::Web::InterfaceAgent::Run][Line:1172]: PerformanceLog file '/opt/otrs/var/log/Performance.log' is too large, you need to reset it in PerformanceLog page!
@Dzahn This seems to be related to https://sal.toolforge.org/log/scLeA38B1jz_IcWuCEVv
@jbond Thanks for the quick fix. I confirm emails started flowing back to my mailbox again :).
I can also confirm that a new waive of emails just started. Thanks for looking into this, @jbond .
I fixed that one. I 've disabled the performance log. Usage of it never panned out anyway.
@jbond thanks for dealing with this!
Thank you @jbond ! We did talk about the log entries on Fridays but as you say the DB config looked correct.
There was a short outage of OTRS on Feb 16th. The config change I deployed was not the original cause but to fix that. Will share more details next week.
I gave people on #wikmedia-vrt the summary of the incident report basically. Since some were wondering why they got mails at once etc. There is a doc being worked on. Once that is ready we should put it on wikitech.