Page MenuHomePhabricator

cp3041 - Varnish frontend child restarted icinga alert
Closed, ResolvedPublic0 Story Points

Description

Event Timeline

ayounsi triaged this task as Normal priority.May 30 2019, 7:12 PM
ayounsi created this task.
Restricted Application added a project: Operations. · View Herald TranscriptMay 30 2019, 7:12 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

That alert basically means that a varnish frontend daemon crashed (and as usual was auto-restarted by a manager process). These are pretty rare and usually worth some investigation.

Runbook should probably say to gather up the crash info from syslog and attach it to a private paste or task for later analysis (this isn't just about user PII - the tech details of the crash could highlight a potential exploitable bug we want to avoid drawing attention to until we understand and/or mitigate).

I think the only way to clear the counter it is to do a full frontend restart, or we can just ACK it with a task# and leave it be.

Private paste for this one: P8576

Change 513977 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/varnish4@debian-wmf] Add 0020-assert-error-http1_minimal_response.patch

https://gerrit.wikimedia.org/r/513977

ema moved this task from Triage to Caching on the Traffic board.Jun 3 2019, 3:09 PM

Change 513977 merged by Ema:
[operations/debs/varnish4@debian-wmf] Add 0020-assert-error-http1_minimal_response.patch

https://gerrit.wikimedia.org/r/513977

Mentioned in SAL (#wikimedia-operations) [2019-06-06T10:30:01Z] <ema> varnish 5.1.3-1wm10 uploaded to stretch-wikimedia T224694

Change 514699 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cp1075: stop passing gethdr_extrachance=0

https://gerrit.wikimedia.org/r/514699

Change 514699 merged by Ema:
[operations/puppet@production] cp1075: do not pass gethdr_extrachance=0

https://gerrit.wikimedia.org/r/514699

Mentioned in SAL (#wikimedia-operations) [2019-06-06T12:00:29Z] <ema> cp1075: upgrade varnish to 5.1.3-1wm10 T224694

Mentioned in SAL (#wikimedia-operations) [2019-06-06T12:11:04Z] <ema> cp1075: repool with varnish 5.1.3-1wm10 T224694

Change 517359 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: stop passing gethdr_extrachance to varnish

https://gerrit.wikimedia.org/r/517359

Change 517415 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cp4027: upgrade Varnish to 5.1.3-1wm10

https://gerrit.wikimedia.org/r/517415

Change 517415 merged by Ema:
[operations/puppet@production] cp4027: upgrade Varnish to 5.1.3-1wm10

https://gerrit.wikimedia.org/r/517415

Mentioned in SAL (#wikimedia-operations) [2019-06-17T13:25:11Z] <ema> cp4027: upgrade Varnish packages to 5.1.3-1wm10 T224694

Mentioned in SAL (#wikimedia-operations) [2019-06-17T13:45:22Z] <ema> reboot cp4027 for dist and Varnish upgrade T224694

Change 517359 merged by Ema:
[operations/puppet@production] cache: stop passing gethdr_extrachance to varnish

https://gerrit.wikimedia.org/r/517359

Mentioned in SAL (#wikimedia-operations) [2019-06-18T12:27:34Z] <ema> cp5007 (varnish-be text): reboot for kernel and varnish upgrade T224694

Mentioned in SAL (#wikimedia-operations) [2019-06-18T12:49:09Z] <ema> cp3034 (ats-be upload) cp2002 (varnish-be upload): reboot for kernel and varnish upgrade T224694

Mentioned in SAL (#wikimedia-operations) [2019-06-18T13:10:04Z] <ema> cache nodes: begin rolling reboots for kernel and varnish upgrades T224694

Mentioned in SAL (#wikimedia-operations) [2019-06-18T16:07:38Z] <ema> cache nodes: stop rolling reboots for today, 17/80 done T224694

Mentioned in SAL (#wikimedia-operations) [2019-06-19T08:01:26Z] <ema> cache nodes: resume rolling reboots for kernel and varnish upgrades T224694

Mentioned in SAL (#wikimedia-operations) [2019-06-19T11:07:13Z] <ema> cache nodes: pause rolling reboots for kernel and varnish upgrades T224694 T225998

Mentioned in SAL (#wikimedia-operations) [2019-06-19T12:20:48Z] <ema> cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998

Mentioned in SAL (#wikimedia-operations) [2019-06-19T16:01:26Z] <ema> cache nodes: stop rolling reboots for today, 47/80 done T224694 T225998

Mentioned in SAL (#wikimedia-operations) [2019-06-20T09:17:36Z] <ema> cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998 T226048

ema closed this task as Resolved.Jun 21 2019, 2:21 PM
ema claimed this task.

All cache nodes are currently running Varnish 5.1.3-1wm10, which fixes this.