Page MenuHomePhabricator

Discard of cold, labeled VCL crashes varnish parent and child
Closed, ResolvedPublic

Description

The issue described in https://github.com/varnishcache/varnish-cache/issues/2560 and various other varnish tickets does not only cause a varnish child crash when the cold VCL is used, it also causes a parent process (and child) crash if the VCL is discarded.

Steps to reproduce:

vcl.load vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 /etc/varnish/wikimedia_misc-frontend.vcl
vcl.label wikimedia_misc-frontend vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89
# wait for vcl_cooldown seconds plus a bit, till vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89 goes into "cold" state
vcl.discard vcl-629e1be6-f8b3-4dab-85c1-e6d006c6ba89

Issuing the vcl.discard command kicks the user out of varnishadm, with the varnish service failing as follows:

Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service: main process exited, code=killed, status=6/ABRT
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state.
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service holdoff time over, scheduling restart.
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: varnish-frontend.service failed to schedule restart job: Transaction is destructive.
Jul 23 14:14:44 traffic-text-varnish5 systemd[1]: Unit varnish-frontend.service entered failed state.

At this point no varnishadm commands can be given, and no traffic gets served. The service needs to be started again.

We've deployed a workaround for the issue of labeled VCLs going cold, and we don't automatically discard cold labeled VCL anyways. There are however a bunch of old cold labeled VCLs currently present on various text nodes.

We should restart those varnishes (surely not discard the cold VCLs!) as a precaution.

Not to be confused with T188089.

Event Timeline

ema created this task.Jul 23 2018, 3:16 PM
Restricted Application added a project: Operations. · View Herald TranscriptJul 23 2018, 3:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as Medium priority.Jul 23 2018, 3:16 PM
ema updated the task description. (Show Details)Jul 23 2018, 3:21 PM
ema moved this task from Triage to Caching on the Traffic board.Jul 23 2018, 4:29 PM

Mentioned in SAL (#wikimedia-operations) [2018-07-24T08:38:05Z] <ema> restart varnish-fe on cache_text instances with cold, labeled VCL T200207

ema renamed this task from Discard of cold labeled VCL crashes varnish parent and child to Discard of cold, labeled VCL crashes varnish parent and child .Jul 24 2018, 8:58 AM
ema closed this task as Resolved.Aug 6 2018, 9:00 AM
ema claimed this task.

No more cold VCLs, workaround working fine. Closing.