We are getting production error messages from job runners:
Fatal error: entire web request took longer than 1200 seconds and timed out in /srv/mediawiki/php-1.33.0-wmf.22/vendor/ruflin/elastica/lib/Elastica/Task.php on line 92
[XJkH0QpAMFwAAH461NEAAAAC] /rpc/RunSingleJob.php PHP Fatal Error from line 92 of /srv/mediawiki/php-1.33.0-wmf.22/vendor/ruflin/elastica/lib/Elastica/Task.php: entire web request took longer than 1200 seconds and timed out
These seem to have started March 7th with the rollout of wmf.20. The most likely related patch deployed that week is: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Elastica/+/493160/
MWElasticUtils::deleteByQuery didn't work right until the wmf.20 deploy. Upon deploy these new error messages started to be emitted. Monitoring /_cat/tasks doesn't seem to show any long running deleteByQuery tasks (task action = indices:data/write/delete/byquery). Best guess is the code is not appropriately detecting task completion and is instead waiting around until the job gets killed for running over the 20 minute job time limit.
Investigate and repair.