Page MenuHomePhabricator

Error while sending emails with OTRS
Closed, ResolvedPublic

Description

(Possibly linked to T74109 ?)

I encountered the following while trying to answer en e-mail on OTRS:

Backend ERROR: OTRS-CGI-10 Perl: 5.20.2 OS: linux Time: Thu Feb  4 00:41:43 2016

 Message: Impossible to send message to: XXX@XXX .

 RemoteAddress: XX.XX.XX.XX
 RequestURI: /otrs/index.pl

 Traceback (28156): 
   Module: Kernel::System::Ticket::Article::ArticleSend Line: 2253
   Module: Kernel::Modules::AgentTicketCompose::Run Line: 839
   Module: Kernel::System::Web::InterfaceAgent::Run Line: 1042
   Module: ModPerl::ROOT::ModPerl::Registry::opt_otrs_bin_cgi_2dbin_index_2epl::handler Line: 40
   Module: (eval) (v1.99) Line: 207
   Module: ModPerl::RegistryCooker::run (v1.99) Line: 207
   Module: ModPerl::RegistryCooker::default_handler (v1.99) Line: 173
   Module: ModPerl::Registry::handler (v1.99) Line: 32

Event Timeline

JeanFred raised the priority of this task from to High.
JeanFred updated the task description. (Show Details)
JeanFred added a project: OTRS.
JeanFred added a subscriber: JeanFred.

Another one here -

Backend ERROR: OTRS-CGI-10 Perl: 5.20.2 OS: linux Time: Thu Feb 4 00:55:53 2016

Message: Impossible to send message to: xxx@xxx.

RemoteAddress: xx.xx.xx.xx
RequestURI: /otrs/index.pl

Traceback (32296):

Module: Kernel::System::Ticket::Article::ArticleSend Line: 2253
Module: Kernel::Modules::AgentTicketCompose::Run Line: 839
Module: Kernel::System::Web::InterfaceAgent::Run Line: 1042
Module: ModPerl::ROOT::ModPerl::Registry::opt_otrs_bin_cgi_2dbin_index_2epl::handler Line: 40
Module: (eval) (v1.99) Line: 207
Module: ModPerl::RegistryCooker::run (v1.99) Line: 207
Module: ModPerl::RegistryCooker::default_handler (v1.99) Line: 173
Module: ModPerl::Registry::handler (v1.99) Line: 32

Confirmed, just tried to send an email to myself, got the same error message and never received the email. (OTRS still created an article for it, though, https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=8965675.) Note that mail delivery wasn't always broken after the upgrade; e.g. there was an outbound email in https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=8965589 from Feb 3, 23:49 UTC for which we received the customer's reply a few minutes later.

Just tried another and same result

I debugged a little and found these:

79320 Feb 4 01:10:23 mendelevium OTRS-CGI-10[32296]: [Error][Kernel::System::Email::Sendmail::Send][Line:85]: Can't se nd message: Cannot allocate memory!

..

79324 Feb 4 01:10:24 mendelevium OTRS-CGI-10[32296]: [Error][Kernel::System::Email::Sendmail::Send][Line:85]: Can't send message: Cannot allocate memory!

:/

also "otrs.PostMaster.pl is deprecated, please use console command 'Maint::PostMaster::Read' instead."

RD (one of the OTRS admins) asked me to edit the motd template to announce the current issues.

I found it in /opt/otrs/Kernel/Output/HTML/Templates/Standard/Motd.tt and edited the template directly.

Admins could then use the admin UI to turn on the motd and have it displayed.

We also agreed with "pajz" and RD that since only outgoing mail is affected this can wait until tomorrow but is obviously still high prio.

Looking at https://ganglia.wikimedia.org/latest/graph.php?h=mendelevium.eqiad.wmnet&m=cpu_report&r=custom&s=by%20name&hc=4&mc=2&cs=2%2F3%2F2016%2014%3A21&ce=2%2F4%2F2016%208%3A2&st=1454580197&g=mem_report&z=medium&c=Miscellaneous%20eqiad

reveals the reason. Seems like the new OTRS version has greater memory requirements than the previous versions. I increased the memory of the VM to 8G and rebooted it to apply the change.

Setting this to stalled while we monitor this in order to decide if that's sufficient. I am hoping we don't encounter a memory leak bug.

akosiaris changed the task status from Open to Stalled.Feb 4 2016, 10:15 AM

RD (one of the OTRS admins) asked me to edit the motd template to announce the current issues.

I found it in /opt/otrs/Kernel/Output/HTML/Templates/Standard/Motd.tt and edited the template directly.

Admins could then use the admin UI to turn on the motd and have it displayed.

We also agreed with "pajz" and RD that since only outgoing mail is affected this can wait until tomorrow but is obviously still high prio.

I 've disabled the motd for now. system seems stable and memory usage is still low

Memory usage appears to have stabilized at 4G. Way more than the previous version but acceptable nonetheless. I will monitor it for another day though.

pajz changed the task status from Stalled to Open.Feb 5 2016, 12:13 AM

Note that according to the Ganglia chart @akosiaris has linked to, memory usage has just hit 6 GB -- 7 GB including cache (dunno if that's important in this context). Looking at Ganglia's 1d chart for mendelevium, I also note that the amount of memory used has essentially been ever-increasing in the past ~12h, and indeed increased by 1GB in the last 4 hours, so I wonder if we run into the same issues as before in 4 or 8h time.

Unfortunately, it looks pretty obvious we have a memory leak. It's a relatively slow one as @pajz noted, about 200-250MB per hour. For now I 've installed an apache restart once every 12 hours cron just to avoid immediate problems like the ones noted on this ticket, but this needs to be fixed.

We need to report it upstream (I 'll do that ASAP), gather some input from them (maybe it's a new issue, /me just hoping) and think about solutions.

I am gonna close this as resolved since we know what caused it and have a mitigation in place. The actual memory leak is tracked in T126448 now.