Page MenuHomePhabricator

RuntimeException: Received cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic.
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   RuntimeException: Received cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic.
exception.trace
from /srv/mediawiki/php-1.42.0-wmf.18/extensions/CirrusSearch/includes/Job/JobTraits.php(92)
#0 /srv/mediawiki/php-1.42.0-wmf.18/extensions/CirrusSearch/includes/Job/ElasticaWrite.php(151): CirrusSearch\Job\CirrusGenericJob->decideClusters()
#1 /srv/mediawiki/php-1.42.0-wmf.18/extensions/CirrusSearch/includes/Job/JobTraits.php(144): CirrusSearch\Job\ElasticaWrite->doJob()
#2 /srv/mediawiki/php-1.42.0-wmf.18/extensions/EventBus/includes/JobExecutor.php(80): CirrusSearch\Job\CirrusGenericJob->run()
#3 /srv/mediawiki/rpc/RunSingleJob.php(60): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#4 {main}
Notes

30816 of these in the last 15 minutes. They have been arriving in big bursts since around Feb 8, 2024.

Details

Request URL
https://mw-jobrunner.discovery.wmnet/rpc/RunSingleJob.php

Event Timeline

I note these are all coming from kube-mw-jobrunners

Not hard to imagine a scenario where this interferes with canaries during a deployment.

I think this is fallout of T349796: Move MediaWiki jobs to mw-on-k8s?

Not hard to imagine a scenario where this interferes with canaries during a deployment.

I think this is fallout of T349796: Move MediaWiki jobs to mw-on-k8s?

hrm, seems to have started Feb 8. Which doesn't align, quite, with actions on that task, but tagging in serviceops to take a confirm.

Going back to IRC logs from a bit ago, what I had in mind was T355617: Migrate cloudelastic from public to private IPs.

931134 2024-02-06 12:13:07     dancy   Many `.17 e/C/i/J/JobTraits:92  Received cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic.`
931136 2024-02-06 12:13:14     dancy   brennen:
931137 2024-02-06 12:13:26     dancy   although they're trailing off.
931138 2024-02-06 12:13:53     inflatador      dancy what was the timeline for those errors? We've been migrating cloudelastic to private IPs
931139 2024-02-06 12:13:53     dancy   and now gone. :-)
931140 2024-02-06 12:14:07     dancy   I was looking at last 15 minutes..  and then they went away
931141 2024-02-06 12:14:41     dancy   inflatador: Thanks for the info!
931142 2024-02-06 12:14:54     dancy   Crisis averted. :-)
931143 2024-02-06 12:15:15     inflatador      sure, more context in T355617 if interested. No impact expected, but that cluster doesn't have a lot of redundancy ;(

Going back to IRC logs from a bit ago, what I had in mind was T355617: Migrate cloudelastic from public to private IPs.

Ah ha, that ticket does align with the timing on this error message. Shifting tags again (sorry serviceops), tagging Data-Platform-SRE