The issue described in https://github.com/varnishcache/varnish-cache/issues/2560 and various other varnish tickets does not only cause a varnish child crash when the cold VCL is used, it also causes a parent process (and child) crash if the VCL is discarded.
Steps to reproduce:
vcl.load vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 /etc/varnish/wikimedia_misc-frontend.vcl
vcl.label wikimedia_misc-frontend vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89
# wait for vcl_cooldown seconds plus a bit, till vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 goes into "cold" state
vcl.discard vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89
Issuing the `vcl.discard` command kicks the user out of varnishadm, with the varnish service failing as follows:
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service: main process exited, code=killed, status=6/ABRT
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state.
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service holdoff time over, scheduling restart.
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service failed to schedule restart job: Transaction is destructive.
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state.
At this point no varnishadm commands can be given, and no traffic gets served. The service needs to be started again.
We've [[https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445357/ | deployed a workaround ]] for the issue of labeled VCLs going cold, and we don't automatically discard cold labeled VCL anyways. There are however a bunch of old cold labeled VCLs currently present on various text nodes.
We should restart those varnishes (surely not discard the cold VCLs!) as a precaution.
Not to be confused with T188089.