Page MenuHomePhabricator

OTRS has been eating 100% of mendelevium's CPU for the last fortnight
Closed, ResolvedPublic

Event Timeline

At least, I assume it's OTRS.

If it's not OTRS, it's directly affecting OTRS.

I just ran a memory intensive full-text search that I last ran in mid-March. Back then the search took about fifteen seconds, typical for full-text searches. The same search ran just now took 56 seconds to complete.

Oh it is OTRS. There are multiple /opt/otrs/bin/cgi processes consuming close to 100% of CPU time. stracing returns nothing as well as ltrace. This is starting to look like a forever loop

I 've killed multiple ones to restore service performance. I 've left a couple running to investigate more.

Documenting what I 've seen up to now.

Choosing 20928 process

strace -fp 20928 returns nothing
ltrace -fp 20928 returns nothing

This is starting to smell a lot like a forever loop

perf stat -p 20928 sleep 20

 Performance counter stats for process id '20928':

      19927.257788      task-clock (msec)         #    0.996 CPUs utilized           [100.00%]
               228      context-switches          #    0.011 K/sec                   [100.00%]
                 2      cpu-migrations            #    0.000 K/sec                   [100.00%]
                 0      page-faults               #    0.000 K/sec                  
                 0      cycles                    #    0.000 GHz                    
                 0      stalled-cycles-frontend   #    0.00% frontend cycles idle   
                 0      stalled-cycles-backend    #    0.00% backend  cycles idle   
                 0      instructions             
                 0      branches                  #    0.000 K/sec                  
                 0      branch-misses             #    0.000 K/sec

There are some context-switches though

perf top is reporting perl eating away at CPU but nothing OTRS specific shows up there.

perf top
 28.14%  libperl.so.5.20.2         [.] Perl_foldEQ_utf8_flags
  22.30%  libperl.so.5.20.2         [.] Perl_hv_common
   6.18%  libperl.so.5.20.2         [.] Perl_re_intuit_start
   5.16%  libperl.so.5.20.2         [.] Perl_swash_fetch
   4.01%  libperl.so.5.20.2         [.] 0x0000000000127828

gdb output is quite interesting

#0  0x00007fe5749547f0 in S_perl_hash_one_at_a_time_hard (seed=<optimized out>, len=8, str=0x7fe574a1c119 "ALS") at hv_func.h:513
#1  Perl_hv_common (my_perl=<optimized out>, hv=0x7fe57cb44a70, keysv=0x0, key=0x7fe574a1c114 "SPECIALS", klen=8, flags=0, action=32, 
    val=0x0, hash=<optimized out>) at hv.c:624
#2  0x00007fe5749d0773 in Perl_to_utf8_case (my_perl=0x7fe579aa8ed0, 
    p=0x7fe58b4a3840 "\224<o:p></o:p></span></b></p>\n\n<p class=MsoNoSpacing style='text-align:justify'><o:p>&nbsp;</o:p></p>\n\n<p class=MsoNoSpacing style='text-align:justify'>Saudi Arabia which faces\ninevitable military, poli"..., ustrp=0x7ffcb500f000 "o", 
    lenp=0x7fe574a1c11a, swashp=0x7fe579aa9780, normal=0x0, special=0x7fe5749f406f "") at utf8.c:2042
#3  0x00007fe5749d129d in Perl__to_utf8_fold_flags (my_perl=0x7fe579aa8ed0, 
    p=0x7fe58b4a3840 "\224<o:p></o:p></span></b></p>\n\n<p class=MsoNoSpacing style='text-align:justify'><o:p>&nbsp;</o:p></p>\n\n<p class=MsoNoSpacing style='text-align:justify'>Saudi Arabia which faces\ninevitable military, poli"..., ustrp=0x7ffcb500f000 "o", 
    lenp=0x7ffcb500eff0, flags=2 '\002') at utf8.c:2397
#4  0x00007fe5749d2f88 in Perl_foldEQ_utf8_flags (my_perl=0x7fe579aa8ed0, s1=<optimized out>, pe1=0x7ffcb500f138, l1=<optimized out>, 
    u1=true, s2=<optimized out>, pe2=0x0, l2=5, u2=false, flags=0) at utf8.c:4243
#5  0x00007fe5749c6a1d in S_find_byclass (my_perl=0x7fe574a1c11c, my_perl@entry=0x7fe579aa8ed0, prog=0x7fe579aa8ed0, c=0x5, 
    s=0x7fe58b4a3840 "\224<o:p></o:p></span></b></p>\n\n<p class=MsoNoSpacing style='text-align:justify'><o:p>&nbsp;</o:p></p>\n\n<p class=MsoNoSpacing style='text-align:justify'>Saudi Arabia which faces\ninevitable military, poli"..., 
    strend=0x7fe58b4a3844 "p></o:p></span></b></p>\n\n<p class=MsoNoSpacing style='text-align:justify'><o:p>&nbsp;</o:p></p>\n\n<p class=MsoNoSpacing style='text-align:justify'>Saudi Arabia which faces\ninevitable military, politica"..., reginfo=0x0, 
    reginfo@entry=0x7ffcb500f210) at regexec.c:1839
#6  0x00007fe5749c9619 in Perl_re_intuit_start (my_perl=0x7fe579aa8ed0, rx=<optimized out>, sv=<optimized out>, strbeg=<optimized out>, 
    strpos=0x7fe58b4a3730 "<!DOCTYPE html><html>\n\n<head>\n\n</head>\n\n<body lang=EN-US style='tab-interval:36.0pt'>\n\n<div class=WordSection1>\n\n<p class=MsoNoSpacing align=center style='text-align:center'><b><span\nstyle='font-size:"..., strend=0x7fe58b4a56ce "", flags=1, 
    data=0x0) at regexec.c:1287
#7  0x00007fe5749ca66f in Perl_regexec_flags (my_perl=0x7fe579aa8ed0, rx=0x7fe57ce30c18, 
    stringarg=0x191d6b7 <error: Cannot access memory at address 0x191d6b7>, strend=0x7fe58b4a56ce "", 
    strbeg=0x8 <error: Cannot access memory at address 0x8>, minend=0, sv=0x7fe583942cf0, data=0x0, flags=1) at regexec.c:2535
#8  0x00007fe57495f075 in Perl_pp_subst (my_perl=0x7fe579aa8ed0) at pp_hot.c:2130
#9  0x00007fe5749595e6 in Perl_runops_standard (my_perl=0x7fe579aa8ed0) at run.c:42
#10 0x00007fe5748e24d5 in Perl_call_sv (my_perl=<optimized out>, sv=0x7fe579da8ef0, flags=<optimized out>) at perl.c:2771
#11 0x00007fe574c6a013 in modperl_callback () from /usr/lib/apache2/modules/mod_perl.so
#12 0x00007fe574c6aa66 in modperl_callback_run_handlers () from /usr/lib/apache2/modules/mod_perl.so
#13 0x00007fe574c6b16f in modperl_callback_per_dir () from /usr/lib/apache2/modules/mod_perl.so
#14 0x00007fe574c6514b in ?? () from /usr/lib/apache2/modules/mod_perl.so
#15 0x00007fe574c65377 in modperl_response_handler_cgi () from /usr/lib/apache2/modules/mod_perl.so
#16 0x00007fe578c8b2a0 in ap_run_handler ()
#17 0x00007fe578c8b7e9 in ap_invoke_handler ()
#18 0x00007fe578ca1682 in ap_process_async_request ()ssssssefsdf
#19 0x00007fe578ca1820 in ap_process_request ()
#20 0x00007fe578c9e122 in ?? ()
#21 0x00007fe578c94b10 in ap_run_process_connection ()
#22 0x00007fe57509e7ba in ?? () from /usr/lib/apache2/modules/mod_mpm_prefork.so
#23 0x00007fe57509ea01 in ?? () from /usr/lib/apache2/modules/mod_mpm_prefork.so
#24 0x00007fe57509f667 in ?? () from /usr/lib/apache2/modules/mod_mpm_prefork.so
#25 0x00007fe578c70e7e in ap_run_mpm ()
#26 0x00007fe578c6a3c3 in main ()

Seems like some UTF8 ( ? )HTML is causing a forever loop in Perl_re_intuit_start and called functions

Code for the version we are running is at: http://perl5.git.perl.org/perl.git/blob/c05ce579b6dae0f955ff8922469d49f95f4bf6fd:/regexec.c

Code seems to be looping between lines 1365 -> goto (YES goto restart) 847 -> and down again there calling all functions along the way. There 's a lot of goto statements in that code

I 'll have a look at the git history for that file, maybe the bug has been fixed.

Would it make sense to try to trace this down to the affected ticket? I just googled the text bit ("Saudi Arabia which faces inevitable military, politica") and it seems points to a "Voice of Bahrain" newsletter (http://english.voiceofbahrain.org/wp-content/uploads/2016/04/vob290.pdf) from this month by the "Bahrain Freedom Movement." There are indeed a few emails from the "Bahrain Freedom Movement" (info@vob.org), apparently containing such newsletters, e.g. "BFM Arabic Statement 15th April 2016", ticket 2016041510010245, but also others from past weeks.

2016-04-16: Slightly edited for context

Code seems to be looping between lines 1365 -> goto (YES goto restart) 847 -> and down again there calling all functions along the way. There 's a lot of goto statements in that code

I 'll have a look at the git history for that file, maybe the bug has been fixed.

Maybe fixed in 22b433eff9a1ffa2454e18405a56650f07b385b5 ?

The regex engine got into an infinite loop because of the malformation.
It is trying to back-up over a sequence of UTF-8 continuation bytes.
But the character just before the sequence should be a start byte.

2016041510010245 does indeed contain invalid utf-8 sequences:

«=D9=85=A0 =D8=A7…»

and 12 lines below:
«=D8=A8.=A0…»

Those standalone A0 bytes are not valid utf-8.

Looks like if something intended to add a non-breaking space, and inserted it in iso-8859-1 instead of utf-8 (should have been «=C2=A0»)

Would it make sense to try to trace this down to the affected ticket? I just googled the text bit ("Saudi Arabia which faces inevitable military, politica") and it seems points to a "Voice of Bahrain" newsletter (http://english.voiceofbahrain.org/wp-content/uploads/2016/04/vob290.pdf) from this month by the "Bahrain Freedom Movement." There are indeed a few emails from the "Bahrain Freedom Movement" (info@vob.org), apparently containing such newsletters, e.g. "BFM Arabic Statement 15th April 2016", ticket 2016041510010245, but also others from past weeks.

2016-04-16: Slightly edited for context

I can confirm that at least viewing that specific article is indeed causing the issue to appear. Since the article is closed-no response needed there must be others as well. But I think it will prove useful in reproducing this and verifying whether the commit @Platonides pasted is a fix or not

Got a perl stacktrace as well

[Sat Apr 16 17:24:47 2016] (eval 1714): Hi at (eval 1714) line 1.
[Sat Apr 16 17:24:47 2016] (eval 1714): 	eval 'require Carp;Carp::cluck(\'Hi\')' called at /opt/otrs/Kernel/System/HTMLUtils.pm line 715
[Sat Apr 16 17:24:47 2016] (eval 1714): 	Kernel::System::HTMLUtils::DocumentCleanup(Kernel::System::HTMLUtils=HASH(0x7fe57ab49538), "String", "<html>\x{a}\x{a}<head>\x{a}\x{a}</head>\x{a}\x{a}<body lang=EN-US style='tab-interval"...) called at /opt/otrs/Kernel/Output/HTML/Layout.pm line 4437
[Sat Apr 16 17:24:47 2016] (eval 1714): 	Kernel::Output::HTML::Layout::RichTextDocumentCleanup(Kernel::Output::HTML::Layout=HASH(0x7fe58363c718), "String", "<html>\x{a}\x{a}<head>\x{a}\x{a}</head>\x{a}\x{a}<body lang=EN-US style='tab-interval"...) called at /opt/otrs/Kernel/Output/HTML/Layout.pm line 4284
[Sat Apr 16 17:24:47 2016] (eval 1714): 	Kernel::Output::HTML::Layout::RichTextDocumentServe(Kernel::Output::HTML::Layout=HASH(0x7fe58363c718), "Data", HASH(0x7fe57ab65460), "URL", "Action=AgentTicketAttachment;Subaction=HTMLView;ArticleID=108"..., "Attachments", HASH(0x7fe57ab60bb0), "LoadExternalImages", 0, ...) called at /opt/otrs//Kernel/Modules/AgentTicketAttachment.pm line 188
[Sat Apr 16 17:24:47 2016] (eval 1714): 	Kernel::Modules::AgentTicketAttachment::Run(Kernel::Modules::AgentTicketAttachment=HASH(0x7fe583033d60)) called at /opt/otrs/Kernel/System/Web/InterfaceAgent.pm line 1042
[Sat Apr 16 17:24:47 2016] (eval 1714): 	Kernel::System::Web::InterfaceAgent::Run(Kernel::System::Web::InterfaceAgent=HASH(0x7fe57ab56958)) called at /opt/otrs/bin/cgi-bin/index.pl line 40
[Sat Apr 16 17:24:47 2016] (eval 1714): 	ModPerl::ROOT::ModPerl::Registry::opt_otrs_bin_cgi_2dbin_index_2epl::handler(Apache2::RequestRec=SCALAR(0x7fe58363c868)) called at /usr/lib/x86_64-linux-gnu/perl5/5.20/ModPerl/RegistryCooker.pm line 207
[Sat Apr 16 17:24:47 2016] (eval 1714): 	eval {...} called at /usr/lib/x86_64-linux-gnu/perl5/5.20/ModPerl/RegistryCooker.pm line 207
[Sat Apr 16 17:24:47 2016] (eval 1714): 	ModPerl::RegistryCooker::run(ModPerl::Registry=HASH(0x7fe57ab4dbd8)) called at /usr/lib/x86_64-linux-gnu/perl5/5.20/ModPerl/RegistryCooker.pm line 173
[Sat Apr 16 17:24:47 2016] (eval 1714): 	ModPerl::RegistryCooker::default_handler(ModPerl::Registry=HASH(0x7fe57ab4dbd8)) called at /usr/lib/x86_64-linux-gnu/perl5/5.20/ModPerl/Registry.pm line 32
[Sat Apr 16 17:24:47 2016] (eval 1714): 	ModPerl::Registry::handler("ModPerl::Registry", Apache2::RequestRec=SCALAR(0x7fe58363c868)) called at (eval 1714) line 0
[Sat Apr 16 17:24:47 2016] (eval 1714): 	eval {...} called at (eval 1714) line 0

HTMLUtils.pm line 715 is indeed a regular expression matching

https://github.com/OTRS/otrs/blob/rel-5_0_7/Kernel/System/HTMLUtils.pm#L715

Now I am only missing variable content to get a small PoC running trying to reproduce that.

Krenair set Security to Software security bug.Apr 16 2016, 6:22 PM
Krenair added a project: acl*security.
Krenair changed the visibility from "Public (No Login Required)" to "Custom Policy".

potential ddos vector

Finally I have a PoC and managed to reproduce it reliably. Script and data attached

PoC:


data:

Bug is reproducible on any Debian Jessie system and is present in at least perl version 5.18.2. Link to upstream tracker is https://rt.perl.org/Public/Bug/Display.html?id=123562

The patch mentioned by @Platonides has made it into v5.23.3 and on and I had to backport it for 5.20.2 which is in Debian jessie. After testing locally with the PoC, I 've deployed the resulting Debian packages in mendelevium (version 5.20.2-3+deb8u4.1wmf1) and hand upgraded the box, restarting apache afterwards. The ticket 2016041510010245 (found by @pajz) no longer creates a forever loop consuming CPU. However the article is unparseable and the patch fatally croaks. In doing that it does cause a 500 with a page Internal Server Error. Handling that more gracefully might require patches in OTRS code as well (using an eval around that last regexp in https://github.com/OTRS/otrs/blob/rel-5_0_7/Kernel/System/HTMLUtils.pm#L715 seems to work).

I am thinking we can resolve this task and handle the rest in others.

akosiaris changed the visibility from "Custom Policy" to "All Users".
akosiaris changed Security from Software security bug to None.

I 've reopened the bug to everyone now that we are patched in the system in question. I 'll be reporting it to debian/ubuntu to have the patch included in the distro packages

Legoktm changed the visibility from "All Users" to "Public (No Login Required)".Apr 19 2016, 8:11 PM
akosiaris claimed this task.

Bug has been reported upstream

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=821848

and also got a CVE number: CVE-2015-8853

All in all, patch is probably gonna be included in the next DSA. I think this task is close-able.