Page MenuHomePhabricator

varnish crash upon reload after libvmod-netmapper upgrade due to liburcu6 assertion
Closed, ResolvedPublic

Description

After upgrading libvmod-netmapper on cp4028 to tackle T266567, I've reloaded the varnish-frontend instance to ensure it wouldn't crash. It did crash due to the following liburcu6 assertion:

Oct 28 10:26:39 cp4028 varnishd[20043]: Child (49027) said varnishd: urcu-qsbr.c:470: rcu_register_thread_qsbr: Assertion `!URCU_TLS(rcu_reader).registered' failed.
Oct 28 10:26:40 cp4028 varnishd[20043]: Child (49027) died signal=6
Oct 28 10:26:40 cp4028 varnishd[20043]: Child (49027) Panic at: Wed, 28 Oct 2020 10:26:40 GMT
                                        Wrong turn at cache/cache_main.c:284:
                                        Signal 6 (Aborted) received at 0x700000bf83 si_code -6
                                        version = varnish-6.0.6 revision 29a1a8243dbef3d973aec28dc90403188c1dc8e7, vrt api = 7.1

Luckily I tried the reload on one host before upgrading all other libvmod-netmappers, or multiple instances would have crashed upon the first pool/depool. Canceling the fleet-wide libvmod-netmapper upgrade.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2020-10-28T10:39:33Z] <ema> due to T266651, cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 T266567 T264398

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

@Vgutierrez, since you were involved with the libvmod-netmapper upgrades, would you say that this 2-year-old issue is fixed?