Page MenuHomePhabricator

phd has stopped working a few times on August 14 (and before)
Closed, DuplicatePublic

Description

See for example https://phabricator.wikimedia.org/diffusion/EMML/manage/.

Example:

09:33:12 <icinga-wm> PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
09:35:35 <icinga-wm> RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 4 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator

Maybe check logs and icinga to see if it needs fixing

Event Timeline

This was a temp issue. They sometimes crash. As far as I can see, it's happened 3 times today based on icinga and recovered within 3 minutes of alert.

RhinosF1 renamed this task from Diffusion's working copy status mentions that the Pull and Task daemon are not running to phd has stopped working a few times on August 14 (and before).Aug 14 2022, 1:41 PM
RhinosF1 updated the task description. (Show Details)