2 gt 1
Also that alert doesn't have a runbook, might be worth adding one.
By (I think) adding a note_url field, similar to: https://github.com/wikimedia/puppet/commit/12093fb9c98873e04c790109bdb1e606694acb0f
2 gt 1
Also that alert doesn't have a runbook, might be worth adding one.
By (I think) adding a note_url field, similar to: https://github.com/wikimedia/puppet/commit/12093fb9c98873e04c790109bdb1e606694acb0f
That alert basically means that a varnish frontend daemon crashed (and as usual was auto-restarted by a manager process). These are pretty rare and usually worth some investigation.
Runbook should probably say to gather up the crash info from syslog and attach it to a private paste or task for later analysis (this isn't just about user PII - the tech details of the crash could highlight a potential exploitable bug we want to avoid drawing attention to until we understand and/or mitigate).
I think the only way to clear the counter it is to do a full frontend restart, or we can just ACK it with a task# and leave it be.
Private paste for this one: P8576
Change 513977 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/varnish4@debian-wmf] Add 0020-assert-error-http1_minimal_response.patch
Change 513977 merged by Ema:
[operations/debs/varnish4@debian-wmf] Add 0020-assert-error-http1_minimal_response.patch
Mentioned in SAL (#wikimedia-operations) [2019-06-06T10:30:01Z] <ema> varnish 5.1.3-1wm10 uploaded to stretch-wikimedia T224694
Change 514699 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cp1075: stop passing gethdr_extrachance=0
Change 514699 merged by Ema:
[operations/puppet@production] cp1075: do not pass gethdr_extrachance=0
Mentioned in SAL (#wikimedia-operations) [2019-06-06T12:00:29Z] <ema> cp1075: upgrade varnish to 5.1.3-1wm10 T224694
Mentioned in SAL (#wikimedia-operations) [2019-06-06T12:11:04Z] <ema> cp1075: repool with varnish 5.1.3-1wm10 T224694
Change 517359 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: stop passing gethdr_extrachance to varnish
Change 517415 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cp4027: upgrade Varnish to 5.1.3-1wm10
Change 517415 merged by Ema:
[operations/puppet@production] cp4027: upgrade Varnish to 5.1.3-1wm10
Mentioned in SAL (#wikimedia-operations) [2019-06-17T13:25:11Z] <ema> cp4027: upgrade Varnish packages to 5.1.3-1wm10 T224694
Mentioned in SAL (#wikimedia-operations) [2019-06-17T13:45:22Z] <ema> reboot cp4027 for dist and Varnish upgrade T224694
Change 517359 merged by Ema:
[operations/puppet@production] cache: stop passing gethdr_extrachance to varnish
Mentioned in SAL (#wikimedia-operations) [2019-06-18T12:27:34Z] <ema> cp5007 (varnish-be text): reboot for kernel and varnish upgrade T224694
Mentioned in SAL (#wikimedia-operations) [2019-06-18T12:49:09Z] <ema> cp3034 (ats-be upload) cp2002 (varnish-be upload): reboot for kernel and varnish upgrade T224694
Mentioned in SAL (#wikimedia-operations) [2019-06-18T13:10:04Z] <ema> cache nodes: begin rolling reboots for kernel and varnish upgrades T224694
Mentioned in SAL (#wikimedia-operations) [2019-06-18T16:07:38Z] <ema> cache nodes: stop rolling reboots for today, 17/80 done T224694
Mentioned in SAL (#wikimedia-operations) [2019-06-19T08:01:26Z] <ema> cache nodes: resume rolling reboots for kernel and varnish upgrades T224694
Mentioned in SAL (#wikimedia-operations) [2019-06-19T11:07:13Z] <ema> cache nodes: pause rolling reboots for kernel and varnish upgrades T224694 T225998
Mentioned in SAL (#wikimedia-operations) [2019-06-19T12:20:48Z] <ema> cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998
Mentioned in SAL (#wikimedia-operations) [2019-06-19T16:01:26Z] <ema> cache nodes: stop rolling reboots for today, 47/80 done T224694 T225998
Mentioned in SAL (#wikimedia-operations) [2019-06-20T09:17:36Z] <ema> cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998 T226048