16:58, 9 May 2018 Jianhui67 (talk | contribs) globally renamed Hosiryuhosi to Rxy (Requested)
Rename stuck "queued" at all wikis at https://deployment.wikimedia.beta.wmflabs.org/wiki/Special:GlobalRenameProgress/Rxy
16:58, 9 May 2018 Jianhui67 (talk | contribs) globally renamed Hosiryuhosi to Rxy (Requested)
Rename stuck "queued" at all wikis at https://deployment.wikimedia.beta.wmflabs.org/wiki/Special:GlobalRenameProgress/Rxy
cpjobqueue: KafkaConsumer is not connected
I see lots of those at Logstash:
cpjobqueue: KafkaConsumer is not connected at Function.createLibrdkafkaError [as create] (/srv/deployment/cpjobqueue/deploy-cache/revs/5c1dcb96e0539f63ec033a845d2150283c211493/node_modules/node-rdkafka/lib/error.js:260:10) at /srv/deployment/cpjobqueue/deploy-cache/revs/5c1dcb96e0539f63ec033a845d2150283c211493/node_modules/node-rdkafka/lib/kafka-consumer.js:442:29
if the jobqueue is not working, then it's logical globalrename isn't working.
The deployment-kafka-jumbo{1,2} machines are failing on puppet as well:
The last Puppet run was at Wed May 9 16:40:22 UTC 2018 (1083 minutes ago). maurelio@deployment-kafka-jumbo-1:~$ sudo puppet agent -tv Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Could not find data item profile::kafka::mirror::source_cluster_name in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/kafka/mirror.pp:48:33 on node deployment-kafka-jumbo-1.deployment-prep.eqiad.wmflabs Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
The last Puppet run was at Wed May 9 16:37:39 UTC 2018 (1088 minutes ago). maurelio@deployment-kafka-jumbo-2:~$ sudo puppet agent -tv Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Could not find data item profile::kafka::mirror::source_cluster_name in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/kafka/mirror.pp:48:33 on node deployment-kafka-jumbo-2.deployment-prep.eqiad.wmflabs Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
@Ottomata Hi. I see at https://github.com/wikimedia/puppet/commits/production that you did some commits yesterday with 'kafka' as title. May any of those be the reason? Thanks :)
Mentioned in SAL (#wikimedia-releng) [2018-05-10T13:19:30Z] <Hauskatze> maurelio@deployment-tin:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=deploymentwiki --logwiki=deploymentwiki 'Hosiryuhosi' 'Rxy' | T194376
I've fixed the global rename stuck after the jobqueue restart, and while it finished, none of the accounts ended attached so I'll have to run an attachment script I think. Notwithstanding the error reported cpjobqueue: KafkaConsumer is not connected continues to flood Logstash, so as we discussed on IRC, let's see if @Ottomata can help here and stop kafka from misbehaving (+ the puppet errors on the servers kafka-jumbo machines above). Thanks.
Ah, the jumbo nodes failing is because I changed a prod puppet class, but did not uninclude it from deployment-prep horizon. Should be unrelated Will fix.
The commits yesterday were about upgrading Kafka from 0.9.0.1 to 1.1.0 in production. The deployment-prep upgrade was done last week.
Will look though, we might need Petr's help if this is about job queue clients.