Page MenuHomePhabricator

varnishkafka statsv and webrequest crashed on cp1081
Closed, ResolvedPublic

Description

Today varnishkafka-statsv.service and varnishkafka-webrequest.service crashed almost at the same time on cp1081:

statsv:

Aug 27 14:18:28 cp1081 varnishkafka[220830]: Assert error in vslc_vtx_next(), vsl_dispatch.c line 285:
Aug 27 14:18:28 cp1081 varnishkafka[220830]:   Condition(c->offset <= c->vtx->len) not true.

webrequest:

Aug 27 14:18:26 cp1081 varnishkafka[220836]: Assert error in vslc_vtx_next(), vsl_dispatch.c line 285:
Aug 27 14:18:26 cp1081 varnishkafka[220836]:   Condition(c->offset <= c->vtx->len) not true.

systemd's Restart=on-failure did kick in but in both cases the issue happened shortly after restart and the services were not restarted properly. I did try a manual restart a few minutes later (14:39), which worked.

See P8984 and P8985 for the relevant logs.

Event Timeline

ema created this task.Aug 27 2019, 2:47 PM
Restricted Application added a project: Operations. · View Herald TranscriptAug 27 2019, 2:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as Normal priority.Aug 27 2019, 2:47 PM
ema moved this task from Triage to Caching on the Traffic board.
Nuria added a subscriber: Ottomata.Aug 29 2019, 5:04 PM

ping @Ottomata can you think of any steps we should take here next?

From a very cursory glance and your links, this looks like some possible VSL corruption? This might be just a fluke possibly caused by load, I think we should not spend cycles on figuring it out unless it happens again/often.

See also: https://github.com/varnishcache/varnish-cache/issues/2237

Nuria closed this task as Resolved.Aug 30 2019, 8:12 AM

Agreed, closing.

elukey added a comment.Sep 9 2019, 6:35 AM

I agree with Andrew, the issue seems to be a violation of an assert or similar in the Varnish libs, so unlikely related to a Varnishkafka bug (famous last words). +1 to wait for it to re-occur before spending more time.