Page MenuHomePhabricator

Avoid false warnings in rakkaus/#mediawiki-i18n- [] JobQueue is not running
Open, Needs TriagePublic


This is a cron job that runs every 10 minutes to check the status. The jobqueue is processed by systemd job which automatically restarts (with few seconds delay) when reaching limit of maximum jobs per one execution (1000) or on failure (I believe, didn't check). This sometimes causes false positives when check coincides with a restart.

It was added because sometimes (like after a reboot?) the jobqueue didn't run for days and that caused a lot of issues. Maybe checking the number of jobs in the queue instead and warn if it is high? Or age of oldest job?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 30 2019, 8:47 AM
abi_ added a subscriber: abi_.May 1 2019, 4:33 AM
abi_ added a comment.May 6 2019, 2:33 AM

Another option maybe is to track if multiple simultaneous (say 3) checks failed. The script would have to write the last few detected states to a file which it can read on startup / check fail. This should also reduce the false positives.

Though I think the oldest job approach might be best. In general how long does a job spend in the queue?

We don't have stats on about the job queue, so I don't know.

I made the holdoff time between service restarts much shorter. This is now less common but still happens.