Page MenuHomePhabricator

Emit changeprop "worker died" messages as metrics
Open, MediumPublic

Description

When a variety of things go wrong in changeprop, we get into a cycle where workers die silently. This is somewhat opaque on graphs apart from the buildup of things like backlogs but when logs are looked at we can see bursts of workers dying. It'd be very useful if we could emit dying workers as a metric

https://logstash.wikimedia.org/app/discover#/doc/2d891220-161a-11ea-a364-c747e6d6cfc2/logstash-syslog-2021.03.24?id=IvW-Y3gBA6MeBtBqQ6GM