Page MenuHomePhabricator

Prometheus Varnish Exporter fails to start on some instances in DRMRS with Out of Memory Error
Closed, InvalidPublic

Description

Instance affected were cp6010 and cp6014 . The service prometheus-varnish-exporter@frontend failed to
recover after an alert indicating its unavailability was triggered. Usually, the service unavailability recovers after
a few minutes, but that was not the case with these instances service. This happened about the same time
when ac core router in DRMRS was down, see discussion on task T300277

A manual restart was required, for them to run. Further checking with journalctl -u prometheus-varnish-exporter@frontend
the logs showed that the instance had crashed due to an out of memory error and eventually too many restarts hence the
failure in recovery. Below are the logs from jounalctl

Feb 21 08:30:16 cp6014 systemd[1]: Started Prometheus exporter for Varnish (instance frontend).
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: fatal error: runtime: out of memory
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime stack:
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.throw(0x863ae6, 0x16)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/panic.go:608 +0x72 fp=0x7ffeb4b3fd00 sp=0x7ffeb4b3fcd0 pc=0x42cde2
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.sysMap(0xc000000000, 0x4000000, 0xbc1098)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mem_linux.go:156 +0xc7 fp=0x7ffeb4b3fd40 sp=0x7ffeb4b3fd00 pc=0x4189d7
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mheap).sysAlloc(0xba7d80, 0x4000000, 0x0, 0x0)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/malloc.go:619 +0x1c7 fp=0x7ffeb4b3fdc8 sp=0x7ffeb4b3fd40 pc=0x40c757
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mheap).grow(0xba7d80, 0x1, 0x0)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mheap.go:920 +0x42 fp=0x7ffeb4b3fe20 sp=0x7ffeb4b3fdc8 pc=0x424d62
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mheap).allocSpanLocked(0xba7d80, 0x1, 0xbc10a8, 0x0)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mheap.go:848 +0x337 fp=0x7ffeb4b3fe60 sp=0x7ffeb4b3fe20 pc=0x424be7
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mheap).alloc_m(0xba7d80, 0x1, 0x2a, 0x0)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mheap.go:692 +0x119 fp=0x7ffeb4b3fea0 sp=0x7ffeb4b3fe60 pc=0x4243f9
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mheap).alloc.func1()
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mheap.go:759 +0x4c fp=0x7ffeb4b3fed8 sp=0x7ffeb4b3fea0 pc=0x4556ac
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mheap).alloc(0xba7d80, 0x1, 0x1002a, 0x7ffeb4b3ff40)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mheap.go:758 +0x8a fp=0x7ffeb4b3ff28 sp=0x7ffeb4b3fed8 pc=0x42469a
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mcentral).grow(0xba9b38, 0x0)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mcentral.go:232 +0x94 fp=0x7ffeb4b3ff70 sp=0x7ffeb4b3ff28 pc=0x4183d4
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mcentral).cacheSpan(0xba9b38, 0x0)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mcentral.go:106 +0x2f8 fp=0x7ffeb4b3ffb8 sp=0x7ffeb4b3ff70 pc=0x417f28
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mcache).refill(0x7f29b908d000, 0x220000002a)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/mcache.go:122 +0x95 fp=0x7ffeb4b3ffe8 sp=0x7ffeb4b3ffb8 pc=0x417ae5
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mcache).nextFree.func1()
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/malloc.go:749 +0x32 fp=0x7ffeb4b40008 sp=0x7ffeb4b3ffe8 pc=0x454ac2
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.(*mcache).nextFree(0x7f29b908d000, 0xbc112a, 0x4000, 0x7f29b908d000, 0x7ffeb4b400c8)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/malloc.go:748 +0xb6 fp=0x7ffeb4b40060 sp=0x7ffeb4b40008 pc=0x40ce06
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.mallocgc(0x180, 0x85b180, 0x7ffeb4b40101, 0x7f29b9091000)
Feb 21 08:30:16 cp6014 systemd[1]: prometheus-varnish-exporter@frontend.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/malloc.go:903 +0x793 fp=0x7ffeb4b40100 sp=0x7ffeb4b40060 pc=0x40d753
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.newobject(0x85b180, 0xbc1100)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/malloc.go:1032 +0x38 fp=0x7ffeb4b40130 sp=0x7ffeb4b40100 pc=0x40db38
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.malg(0x7f2900008000, 0x7f29b908d000)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/proc.go:3285 +0x31 fp=0x7ffeb4b40170 sp=0x7ffeb4b40130 pc=0x436321
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.mpreinit(0xba2860)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/os_linux.go:311 +0x29 fp=0x7ffeb4b40190 sp=0x7ffeb4b40170 pc=0x42b219
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.mcommoninit(0xba2860)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/proc.go:624 +0xc1 fp=0x7ffeb4b401c8 sp=0x7ffeb4b40190 pc=0x42fc31
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.schedinit()
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/proc.go:546 +0x89 fp=0x7ffeb4b40230 sp=0x7ffeb4b401c8 pc=0x42f8f9
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]: runtime.rt0_go(0x7ffeb4b40268, 0x6, 0x7ffeb4b40268, 0x0, 0x7f29b92bd0ca, 0x6, 0x7ffeb4b40eb6, 0x7ffeb4b40edb, 0x7ffeb4b40ee0, 0x7ffeb4b40ee3, ...)
Feb 21 08:30:16 cp6014 prometheus-varnish-exporter[2227]:         /usr/lib/go-1.11/src/runtime/asm_amd64.s:195 +0x11a fp=0x7ffeb4b40238 sp=0x7ffeb4b40230 pc=0x45739a
Feb 21 08:30:16 cp6014 systemd[1]: prometheus-varnish-exporter@frontend.service: Failed with result 'exit-code'.

Below is grafana's panel showing the cpu and memory utilization of on instance cp6014:

image.png (449×1 px, 62 KB)

The gap in the graphs do indicate when cr2-drmrs was down.

Event Timeline

BCornwall subscribed.

I went through historical metrics and found no evidence that this is still occurring. Considering the cp nodes have been reimaged and this seemingly has not been a problem, I'm going to mark this as invalid. Please do re-open if it's still an issue!