The issue described in https://github.com/varnishcache/varnish-cache/issues/2560 and various other varnish tickets does not only cause a varnish child crash when the cold VCL is used, it also causes a parent process (and child) crash if the VCL is discarded.
Steps to reproduce:
vcl.load vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 /etc/varnish/wikimedia_misc-frontend.vcl vcl.label wikimedia_misc-frontend vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 # wait for vcl_cooldown seconds plus a bit, till vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 goes into "cold" state vcl.discard vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89
Issuing the vcl.discard command kicks the user out of varnishadm, with the varnish service failing as follows:
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service: main process exited, code=killed, status=6/ABRT Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state. Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service holdoff time over, scheduling restart. Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service failed to schedule restart job: Transaction is destructive. Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state.
At this point no varnishadm commands can be given, and no traffic gets served. The service needs to be started again.
We've deployed a workaround for the issue of labeled VCLs going cold, and we don't automatically discard cold labeled VCL anyways. There are however a bunch of old cold labeled VCLs currently present on various text nodes.
We should restart those varnishes (surely not discard the cold VCLs!) as a precaution.
Not to be confused with T188089.