Page MenuHomePhabricator

VCL discards crash varnish frontend child process
Closed, ResolvedPublic

Description

While working on T187778 we have observed that varnish frontend child processes crash when their VCL is discarded:

11:04:52 ema@cp5003.eqsin.wmnet:~
$ sudo varnishadm -n frontend vcl.discard vcl-root-94ba9e9b-2257-4702-bb29-5c3d6f07ca09
Feb 23 11:05:06 cp5003 varnishd[4905]: CLI telnet 127.0.0.1 43763 127.0.0.1 6082 Wr 200 -----------------------------
                                       Varnish Cache CLI 1.0
                                       -----------------------------
                                       Linux,4.9.0-0.bpo.5-amd64,x86_64,-junix,-smalloc,-smalloc,-hcritbit
                                       varnish-5.1.3 revision NOGIT
                                       
                                       Type 'help' for command list.
                                       Type 'quit' to close CLI session.
Feb 23 11:05:06 cp5003 varnishd[4905]: CLI telnet 127.0.0.1 43763 127.0.0.1 6082 Rd ping
Feb 23 11:05:06 cp5003 varnishd[4905]: CLI telnet 127.0.0.1 43763 127.0.0.1 6082 Wr 200 PONG 1519383906 1.0
Feb 23 11:05:06 cp5003 varnishd[4905]: CLI telnet 127.0.0.1 43763 127.0.0.1 6082 Rd vcl.discard vcl-root-94ba9e9b-2257-4702-bb29-5c3d6f07ca09
Feb 23 11:05:06 cp5003 varnishd[4905]: Failed to kill child with PID 4914: Operation not permitted
Feb 23 11:05:06 cp5003 varnishd[4905]: CLI telnet 127.0.0.1 43763 127.0.0.1 6082 Wr 200
Feb 23 11:05:06 cp5003 varnishd[4905]: Child (4914) died signal=6
Feb 23 11:05:06 cp5003 varnishd[4905]: Child (4914) Panic at: Fri, 23 Feb 2018 11:05:06 GMT
                                       Assert error in child_sigsegv_handler(), mgt/mgt_child.c line 271:
                                         Condition(Segmentation fault by instruction at (nil)) not true.
                                       version = varnish-5.1.3 revision NOGIT, vrt api = 6.0
                                       ident = Linux,4.9.0-0.bpo.5-amd64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
                                       now = 883625.483118 (mono), 1519383906.483403 (real)
                                       Backtrace:
                                         0x439665: /usr/sbin/varnishd() [0x439665]
                                         0x465bda: /usr/sbin/varnishd() [0x465bda]
                                         0x7f701e552890: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f701e552890]
                                         0x7f700d959384: ./vmod_cache/_vmod_netmapper.DQDUKHATYYOESBIV@D@GOQXNE@OUMBDW(vnm_db_destruct+0x4) [0x7f700d959384]  
                                         0x7f700d958e4c: ./vmod_cache/_vmod_netmapper.DQDUKHATYYOESBIV@D@GOQXNE@OUMBDW(+0x1e4c) [0x7f700d958e4c]              
                                         0x7f70099fc9aa: vcl_vcl-root-94ba9e9b-2257-4702-bb29-5c3d6f07ca09.1518710034.308741808/vgc.so(+0x89aa) [0x7f70099fc9aa
                                         0x445694: /usr/sbin/varnishd() [0x445694]                                                                            
                                         0x448991: /usr/sbin/varnishd(VCL_Poll+0x111) [0x448991]                                                              
                                         0x4819fa: /usr/sbin/varnishd() [0x4819fa]                                                                            
                                         0x481d67: /usr/sbin/varnishd() [0x481d67]                                                                            
                                       thread = (cache-main)                                                                                                  
                                       thr.req = (nil) {                                                                                                      
                                       },                                                                                                                     
                                       thr.busyobj = (nil) {                                                                                                  
                                       },                                                                                                                     
Feb 23 11:05:06 cp5003 varnishd[4905]: Child cleanup complete

The netmapper vmod seems to be involved in the issue.

Event Timeline

ema created this task.Feb 23 2018, 11:11 AM
Restricted Application added a project: Operations. · View Herald TranscriptFeb 23 2018, 11:11 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as High priority.Feb 23 2018, 11:11 AM

Change 413740 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/software/varnish/libvmod-netmapper@master] 1.5: bugfix for vcl_fini + nonexistent db

https://gerrit.wikimedia.org/r/413740

Change 413740 merged by Ema:
[operations/software/varnish/libvmod-netmapper@master] 1.6: bugfix for vcl_fini + set thread name

https://gerrit.wikimedia.org/r/413740

Change 413754 had a related patch set uploaded (by Ema; owner: Ema):
[operations/software/varnish/libvmod-netmapper@debian] 1.6-1: bugfix for vcl_fini + set thread name

https://gerrit.wikimedia.org/r/413754

Change 413754 merged by Ema:
[operations/software/varnish/libvmod-netmapper@debian] 1.6-1: bugfix for vcl_fini + set thread name

https://gerrit.wikimedia.org/r/413754

Mentioned in SAL (#wikimedia-operations) [2018-02-23T17:22:22Z] <ema> libvmod-netmapper 1.6-1 uploaded to apt.w.o/experimental T188089

BBlack closed this task as Resolved.Feb 27 2018, 11:25 PM
BBlack claimed this task.