Page MenuHomePhabricator

Varnishkafka should auto-reconnect to abandoned VSM
Closed, ResolvedPublic8 Estimated Story Points

Description

When Varnish restarts the SHM log file gets abandoned and Varnishkafka exits gracefully as consequence. It would be nice to implement a "auto-reconnect" behavior like varnishlog:

"""
$ sudo varnishlog -g request
Log abandoned
Log reacquired
"""

We are currently rely on systemd to restart Varnishkafka when it fails or shutsdown.

Useful snippet from VUT:
https://github.com/varnishcache/varnish-cache/blob/4.1/lib/libvarnishtools/vut.c#L366

Steps for this task:

  1. Add the code necessary to implement the re-connect feature to Varnishkafka
  2. Code Review, tests
  3. Debian packaging
  4. Deploy to cp Misc and Maps
  5. Deploy to Upload and Text

Event Timeline

ema triaged this task as Medium priority.Jun 27 2016, 11:00 AM

What about sequence numbers?

Now when varnish restarts we are restarting varnishkafka and thus sequence numbers go to zero. If varnishkafka reconnects to teh new varnish instance automatically (without a res-tart) sequence numbers will continue increase.

Milimetric moved this task from Dashiki to Backlog (Later) on the Analytics board.
elukey lowered the priority of this task from Medium to Low.Jul 18 2016, 8:41 AM

Change 311415 had a related patch set uploaded (by Elukey):
Improve resilience during varnish restarts

https://gerrit.wikimedia.org/r/311415

Change 311415 merged by Elukey:
Improve resilience during varnish (re)starts

https://gerrit.wikimedia.org/r/311415

Change 311965 had a related patch set uploaded (by Elukey):
Improve resilience during varnish (re)starts

https://gerrit.wikimedia.org/r/311965

Change 311965 merged by Elukey:
Improve resilience during varnish (re)starts

https://gerrit.wikimedia.org/r/311965

Mentioned in SAL (#wikimedia-operations) [2016-09-21T17:40:29Z] <elukey> installed varnishkafka 1.0.12-1 on cp3034.esams (T138747)

elukey raised the priority of this task from Low to Medium.Sep 22 2016, 6:09 AM
elukey edited projects, added Analytics-Kanban; removed Patch-For-Review, Analytics.
elukey updated the task description. (Show Details)
elukey set the point value for this task to 8.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Rolled out in cache misc / maps / upload (text is still using Varnish 3)