I noticed some fatal errors from the job queue in Logstash for execution timeouts. This seems surprising as jobs are typically (for better or worse) the place that we defer slower and more costly computations to.
A timeout of 180s seems too low for that. Not in the least because even regular POST web requests are given 200 seconds.
When parsing a normal edit from an end-user, we allow 200s of processing time. But when a job is processing a batch of muliple cascading updates for edits, we abort after 180s? That seems broken.
In wmf-config/set-time-limit.php it says:
case 'jobrunner.svc.eqiad.wmnet': case 'jobrunner.svc.codfw.wmnet': case 'jobrunner.discovery.wmnet': $limit = 1200; break; default: if ( $_SERVER['REQUEST_METHOD'] === 'POST' ) { $limit = 200; } else { $limit = 60; }
So the fact that Logstash is showing fatal execution timeouts at 180s suggests strongly something has recently regressed here.
Maximum execution time of 180 seconds exceeded … #25 /srv/mediawiki/rpc/RunSingleJob.php(76): JobExecutor->execute() … host: jobrunner.discovery.wmnet