Page MenuHomePhabricator

Add alert(s) for unusual job processing backlog increases
Open, Needs TriagePublic

Description

Recently, the job queue was blocked by an unexpected usage of the release timestamp feature, rapidly resulting in a large backlog of unprocessed jobs of various types. This problem was first identified by users who noticed that certain tasks that depend on the job queue were not being completed as expected; no alerts were fired. When the job queue's backlog of unprocessed jobs rises to an unusual level or at an unexpected rate, an automated alert should be triggered.

https://wikitech.wikimedia.org/wiki/Incident_documentation/20191211-MachineVision%2Bcpjobqueue