Page MenuHomePhabricator

Configure a prometheus dead man's snitch alert
Closed, ResolvedPublic

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+201 -18
labs/privatemaster+1 -1
labs/privatemaster+4 -0
operations/puppetproduction+2 -2
operations/puppetproduction+0 -21
operations/puppetproduction+13 -4
operations/puppetproduction+17 -69
operations/puppetproduction+39 -16
operations/puppetproduction+5 -5
operations/puppetproduction+7 -47
operations/puppetproduction+40 -5
operations/puppetproduction+31 -25
operations/puppetproduction+37 -4
operations/puppetproduction+5 -4
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+52 -22
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/dnsmaster+4 -0
operations/puppetproduction+19 -19
operations/puppetproduction+598 -0
operations/puppetproduction+568 -0
operations/puppetproduction+448 -1
operations/puppetproduction+158 -0
Show related patches Customize query in gerrit

Event Timeline

Change #1159369 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] deadmansnitch: add dms alert and am hook

https://gerrit.wikimedia.org/r/1159369

Change #1165514 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: add dead man snitch and public endpoint

https://gerrit.wikimedia.org/r/1165514

Change #1159369 abandoned by Tiziano Fogli:

[operations/puppet@production] deadmansnitch: add dms alert and am hook

Reason:

Replaced by a new patchset.

https://gerrit.wikimedia.org/r/1159369

Change #1165518 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: add dead man snitch and public endpoint

https://gerrit.wikimedia.org/r/1165518

Change #1165514 abandoned by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: add dead man snitch and public endpoint

Reason:

The Change-Id was regenerated by mistake.

https://gerrit.wikimedia.org/r/1165514

Change #1167157 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: add dead man switch and public endpoint

https://gerrit.wikimedia.org/r/1167157

Change #1165518 abandoned by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: add dead man switch and public endpoint

Reason:

The Change-Id was regenerated by mistake.

https://gerrit.wikimedia.org/r/1165518

Change #1167157 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: add dead man switch and public endpoint

https://gerrit.wikimedia.org/r/1167157

Change #1169668 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/dns@master] prom/metamonitor: add CNAMEs for metamonitoring endpoints

https://gerrit.wikimedia.org/r/1169668

Change #1170104 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: make physical vhosts agnostic to the machine hostname

https://gerrit.wikimedia.org/r/1170104

Change #1170104 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: make physical vhosts agnostic to the machine hostname

https://gerrit.wikimedia.org/r/1170104

Change #1169668 merged by Tiziano Fogli:

[operations/dns@master] prom/metamonitor: add CNAMEs for metamonitoring endpoints

https://gerrit.wikimedia.org/r/1169668

Change #1170282 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prometheus::pop: manage pop Prometheus instances centrally

https://gerrit.wikimedia.org/r/1170282

Change #1170282 merged by Tiziano Fogli:

[operations/puppet@production] prometheus::pop: manage pop Prometheus instances centrally

https://gerrit.wikimedia.org/r/1170282

Change #1170286 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: simplify PQL query to retrieve instance list

https://gerrit.wikimedia.org/r/1170286

Change #1170286 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: simplify PQL query to retrieve instance list

https://gerrit.wikimedia.org/r/1170286

Change #1170360 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: hide DeadManSwitch alerts in Karma

https://gerrit.wikimedia.org/r/1170360

Change #1170360 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: hide DeadManSwitch alerts in Karma

https://gerrit.wikimedia.org/r/1170360

Change #1170540 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: fix typo on karma erb config file

https://gerrit.wikimedia.org/r/1170540

Change #1170540 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: fix typo on karma erb config file

https://gerrit.wikimedia.org/r/1170540

Change #1170545 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: fix indentation on karma erb config file

https://gerrit.wikimedia.org/r/1170545

Change #1170545 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: fix indentation on karma erb config file

https://gerrit.wikimedia.org/r/1170545

Change #1171170 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: add listen_port to public_endpoint vhost template

https://gerrit.wikimedia.org/r/1171170

Change #1171171 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamonitor: force gunicorn to log to a file

https://gerrit.wikimedia.org/r/1171171

Change #1171170 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: add listen_port to public_endpoint vhost template

https://gerrit.wikimedia.org/r/1171170

Change #1171171 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamonitor: force gunicorn to log to a file

https://gerrit.wikimedia.org/r/1171171

Change #1171546 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] prom/metamon: add a dedicated sysuser for the daemons

https://gerrit.wikimedia.org/r/1171546

Change #1171546 merged by Tiziano Fogli:

[operations/puppet@production] prom/metamon: add a dedicated sysuser for the daemons

https://gerrit.wikimedia.org/r/1171546

"Dead Man Switch" alerts have been configured.
The implementation has been documented here: https://wikitech.wikimedia.org/wiki/Prometheus#Meta-Monitoring.

Today the metamonitoring pa.ged on-call SREs. When checking the metamonitoring_public_endpoint.service on alert1002 I see a python stack trace during the time of the the alert:

Oct 02 06:09:41 alert1002 systemd[1]: Stopping metamonitoring_public_endpoint.service - Gunicorn for metamonitor_public_endpoint...
Oct 02 06:09:41 alert1002 gunicorn[3038713]: --- Logging error ---
Oct 02 06:09:41 alert1002 gunicorn[3038713]: --- Logging error ---
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Traceback (most recent call last):
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 224, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     handler()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 257, in handle_term
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     raise StopIteration
Oct 02 06:09:41 alert1002 gunicorn[3038713]: StopIteration
Oct 02 06:09:41 alert1002 gunicorn[3038713]: During handling of the above exception, another exception occurred:
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Traceback (most recent call last):
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.flush()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.stream.flush()
Oct 02 06:09:41 alert1002 gunicorn[3038713]: RuntimeError: reentrant call inside <_io.BufferedWriter name='/var/log/o11y-metamonitoring/public_endpoint.log'>
Oct 02 06:09:41 alert1002 gunicorn[3038713]: During handling of the above exception, another exception occurred:
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Traceback (most recent call last):
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.flush()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.stream.flush()
Oct 02 06:09:41 alert1002 gunicorn[3038713]: RuntimeError: reentrant call inside <_io.BufferedWriter name='/var/log/o11y-metamonitoring/public_endpoint.log'>
Oct 02 06:09:41 alert1002 gunicorn[3038713]: During handling of the above exception, another exception occurred:
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Traceback (most recent call last):
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.flush()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.stream.flush()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 242, in handle_chld
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.reap_workers()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 530, in reap_workers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.log.warning(
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/glogging.py", line 261, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.error_log.warning(msg, *args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1501, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self._log(WARNING, msg, args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1634, in _log
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.handle(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1644, in handle
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.callHandlers(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1706, in callHandlers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     hdlr.handle(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 978, in handle
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.emit(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1230, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     StreamHandler.emit(self, record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1118, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.handleError(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1031, in handleError
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     sys.stderr.write('--- Logging error ---\n')
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 242, in handle_chld
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.reap_workers()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 530, in reap_workers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.log.warning(
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/glogging.py", line 261, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.error_log.warning(msg, *args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1501, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self._log(WARNING, msg, args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1634, in _log
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.handle(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1644, in handle
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.callHandlers(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1706, in callHandlers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     hdlr.handle(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 978, in handle
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.emit(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1230, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     StreamHandler.emit(self, record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1118, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.handleError(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1031, in handleError
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     sys.stderr.write('--- Logging error ---\n')
Oct 02 06:09:41 alert1002 gunicorn[3038713]: RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'>
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Call stack:
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/bin/gunicorn", line 33, in <module>
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     sys.exit(load_entry_point('gunicorn==20.1.0', 'console_scripts', 'gunicorn')())
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/app/wsgiapp.py", line 67, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/app/base.py", line 231, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     super().run()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/app/base.py", line 72, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     Arbiter(self).run()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 227, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.halt()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 342, in halt
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.stop()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 393, in stop
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     time.sleep(0.1)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 242, in handle_chld
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.reap_workers()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 513, in reap_workers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     wpid, status = os.waitpid(-1, os.WNOHANG)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 242, in handle_chld
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.reap_workers()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 530, in reap_workers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.log.warning(
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/glogging.py", line 261, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.error_log.warning(msg, *args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Message: 'Worker with pid %s was terminated due to signal %s'
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Arguments: (3039077, 15)
Oct 02 06:10:11 alert1002 systemd[1]: metamonitoring_public_endpoint.service: Deactivated successfully.
Oct 02 06:10:11 alert1002 systemd[1]: Stopped metamonitoring_public_endpoint.service - Gunicorn for metamonitor_public_endpoint.
Oct 02 06:10:11 alert1002 systemd[1]: metamonitoring_public_endpoint.service: Consumed 13.912s CPU time.
Oct 02 06:10:11 alert1002 systemd[1]: Started metamonitoring_public_endpoint.service - Gunicorn for metamonitor_public_endpoint.
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     sys.stderr.write('--- Logging error ---\n')
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 242, in handle_chld
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.reap_workers()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 530, in reap_workers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.log.warning(
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/glogging.py", line 261, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.error_log.warning(msg, *args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1501, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self._log(WARNING, msg, args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1634, in _log
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.handle(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1644, in handle
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.callHandlers(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1706, in callHandlers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     hdlr.handle(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 978, in handle
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.emit(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1230, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     StreamHandler.emit(self, record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1118, in emit
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.handleError(record)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3.11/logging/__init__.py", line 1031, in handleError
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     sys.stderr.write('--- Logging error ---\n')
Oct 02 06:09:41 alert1002 gunicorn[3038713]: RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'>
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Call stack:
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/bin/gunicorn", line 33, in <module>
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     sys.exit(load_entry_point('gunicorn==20.1.0', 'console_scripts', 'gunicorn')())
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/app/wsgiapp.py", line 67, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/app/base.py", line 231, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     super().run()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/app/base.py", line 72, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     Arbiter(self).run()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 227, in run
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.halt()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 342, in halt
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.stop()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 393, in stop
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     time.sleep(0.1)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 242, in handle_chld
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.reap_workers()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 513, in reap_workers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     wpid, status = os.waitpid(-1, os.WNOHANG)
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 242, in handle_chld
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.reap_workers()
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 530, in reap_workers
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.log.warning(
Oct 02 06:09:41 alert1002 gunicorn[3038713]:   File "/usr/lib/python3/dist-packages/gunicorn/glogging.py", line 261, in warning
Oct 02 06:09:41 alert1002 gunicorn[3038713]:     self.error_log.warning(msg, *args, **kwargs)
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Message: 'Worker with pid %s was terminated due to signal %s'
Oct 02 06:09:41 alert1002 gunicorn[3038713]: Arguments: (3039077, 15)
Oct 02 06:10:11 alert1002 systemd[1]: metamonitoring_public_endpoint.service: Deactivated successfully.
Oct 02 06:10:11 alert1002 systemd[1]: Stopped metamonitoring_public_endpoint.service - Gunicorn for metamonitor_public_endpoint.
Oct 02 06:10:11 alert1002 systemd[1]: metamonitoring_public_endpoint.service: Consumed 13.912s CPU time.
Oct 02 06:10:11 alert1002 systemd[1]: Started metamonitoring_public_endpoint.service - Gunicorn for metamonitor_public_endpoint.

Some minutes before and some minutes after the service worked normally. Also the pa.ge resolved automatically.

Change #1193109 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: replace Gunicorn with uWSGI

https://gerrit.wikimedia.org/r/1193109

Change #1193109 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: replace Gunicorn with uWSGI

https://gerrit.wikimedia.org/r/1193109

Change #1193373 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: cleanup gunicorn related resources

https://gerrit.wikimedia.org/r/1193373

Change #1193374 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: rename uwsgi resource

https://gerrit.wikimedia.org/r/1193374

Change #1193375 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: prepare deadmanswitchamhook's Gunicorn replacement with uWSGI

https://gerrit.wikimedia.org/r/1193375

Change #1193376 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: replace deadmanswitchamhook's Gunicorn with uWSGI

https://gerrit.wikimedia.org/r/1193376

Change #1193381 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: add env vars to uwsgi process

https://gerrit.wikimedia.org/r/1193381

Change #1193382 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: cleanup unneeded env files

https://gerrit.wikimedia.org/r/1193382

Change #1193373 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: cleanup gunicorn related resources

https://gerrit.wikimedia.org/r/1193373

Change #1193374 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: rename uwsgi resource

https://gerrit.wikimedia.org/r/1193374

Change #1193375 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: prepare deadmanswitchamhook's Gunicorn replacement with uWSGI

https://gerrit.wikimedia.org/r/1193375

Change #1193376 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: replace deadmanswitchamhook's Gunicorn with uWSGI

https://gerrit.wikimedia.org/r/1193376

Change #1193381 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: add env vars to uwsgi process

https://gerrit.wikimedia.org/r/1193381

Change #1193382 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: cleanup unneeded env files

https://gerrit.wikimedia.org/r/1193382

Change #1194121 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: avoid unnecessary public endpoint restarts

https://gerrit.wikimedia.org/r/1194121

Change #1194121 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: avoid unnecessary public endpoint restarts

https://gerrit.wikimedia.org/r/1194121

Change #1202990 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[labs/private@master] metamonitoring/icinga/ext-mon: add dummy basic auth info

https://gerrit.wikimedia.org/r/1202990

Change #1202990 merged by Tiziano Fogli:

[labs/private@master] metamonitoring/icinga/ext-mon: add dummy basic auth info

https://gerrit.wikimedia.org/r/1202990

Change #1203405 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[labs/private@master] metamonitoring/icinga/ext-mon: add dummy basic auth info

https://gerrit.wikimedia.org/r/1203405

Change #1203405 merged by Tiziano Fogli:

[labs/private@master] metamonitoring/icinga/ext-mon: add dummy basic auth info

https://gerrit.wikimedia.org/r/1203405

Change #1203845 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] metamonitoring: add icinga module

https://gerrit.wikimedia.org/r/1203845

Change #1203845 merged by Tiziano Fogli:

[operations/puppet@production] metamonitoring: add icinga module

https://gerrit.wikimedia.org/r/1203845