Jobs are not being executed (or at least some) on the Beta Cluster. There are several cpjobqueue, eventbus and kafka errors on logstash-beta going way up to EMERGENCY. As an example, no global renames are being processed, and extensions/CentralAuth/maintenance/fixStuckGlobalRename.php can't locate the jobs in the queue either; which suggests that jobs ain't being added to the queues for some reason. See https://deployment.wikipedia.beta.wmflabs.org/wiki/Special:GlobalRenameProgress and parent task. See also https://logstash-beta.wmflabs.org/goto/8435fc9247afcb5d5647b93803f97a41
|cloud/instance-puppet : master||deployment-mediawiki-parsoid10: Switch labmon1001 to cloudmetrics1002|
|Resolved||None||T241294 Global renames aren't being processed on beta cluster|
|Resolved||Pchelolo||T241448 Job queue broken on Beta Cluster|
I performed a rename yesterday after various instances and services restarts, and other maintenance (cfr. T241462). It was a brand new spambot account with no edits, and just three wikis attached. It worked but took ca. 6 minutes to complete. Certainly not a normal execution time for production where such renames would take seconds to complete.
I am also seeing deployment-jobrunner03 messages of start/finish job executions.
Despite that, I'm not entirely sure the issue with JobQueue, Kafka, Redis, etc. is really fixed here so I'd appreciate if someone familiar with this could take a look.
It worked but took ca. 6 minutes to complete.
I don't think it would be possible any more to debug this issue since the logs are probably rolled away by now.
I'm going to close this ticket. The job queue works now, and it's certainly not an unbreak now anymore. Please open a new one if you keep seeing problems with job queue in beta.