Change Details

The issue described in https://github.com/varnishcache/varnish-cache/issues/2560 and various other varnish tickets does not only cause a varnish child crash when the cold VCL is used, it also causes a parent process (and child) crash if the VCL is discarded. Steps to reproduce: vcl.load vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 /etc/varnish/wikimedia_misc-frontend.vcl vcl.label wikimedia_misc-frontend vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 # wait for vcl_cooldown seconds plus a bit, till vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 goes into "cold" state vcl.discard vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 Issuing the `vcl.discard` command kicks the user out of varnishadm, with the varnish service failing as follows: Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service: main process exited, code=killed, status=6/ABRT Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state. Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service holdoff time over, scheduling restart. Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service failed to schedule restart job: Transaction is destructive. Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state. At this point no varnishadm commands can be given, and no traffic gets served. The service needs to be started again. We've [[https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445357/ | deployed a workaround ]] for the issue of labeled VCLs going cold, and we don't automatically discard cold labeled VCL anyways. There are however a bunch of old cold labeled VCLs currently present on various text nodes. We should restart those varnishes (surely not discard the cold VCLs!) as a precaution. Not to be confused with T188089.