varnish crashed this morning on cp3066:
Wrong turn at cache/cache_wrk.c:629: Worker Pool Queue does not move version = varnish-7.1.1 revision 7cee1c581bead20e88d101ab3d72afb29f14d87a, vrt api = 15.0 ident = Linux,5.10.0-30-amd64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll now = 23667458.893371 (mono), 1744103393.647477 (real) Backtrace: 0x56342d11c54c: /usr/sbin/varnishd(+0x5954c) [0x56342d11c54c] 0x56342d1980f8: /usr/sbin/varnishd(VAS_Fail+0x18) [0x56342d1980f8] 0x56342d14473c: /usr/sbin/varnishd(pool_herder+0x88c) [0x56342d14473c] 0x7f4c28a16ea7: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f4c28a16ea7] 0x7f4c28936acf: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f4c28936acf]
this has been reported to upstream in the past on https://github.com/varnishcache/varnish-cache/issues/3868 but no fix has been made public for that AFAIK
The comment from Nils Goroll is a nice hint though:
Firstly, the intention behind this panic is a good one - to improve your system's uptime by restarting the varnish worker process if something goes terribly wrong. The respective parameter is thread_pool_watchdog.
The potential reasons for the watchdog barking are numerous: Your varnish worker did not run for some time, system resources are exhausted (in particular, memory), your vm got migrated. We really can't know.
Current status: 7.1.1-1.1~bpo11+wmf3 shipping backports of https://github.com/varnishcache/varnish-cache/pull/3947 && https://github.com/varnishcache/varnish-cache/pull/3827 deployed globally. Waiting for another occurrence of the crash (2025-04-15)
