We have had an elasticsearch issue causing problems with creating search indices for new wikis and also with searching on existing wikis since around 2023-12-25.
impact
Since 2023-12-25 newly created Wikibases will have had failing search.
Since 2023-12-27 existing Wikibases may have slow, unreliable or outdated search results
timeline
2023-12-25
first wbstackElasticSearchInit error occurs
2023-12-26
more instances occur
2023-12-27
team investigates the situation
Prometheus metric exporter is deployed
Shard limit per-node is increased to 850 from 800
2023-12-28
Attempt is made to restart failing master node
It's observed that all master nodes are now logging errors relating to slow garbage collection.
2024-01-08
Memory of masters is increased from 8->10GB and heap from 4-5GB PR
Memory is further increased to be 25% of the datanode memory PR
details
The initially noticed error
RuntimeException: wbstackElasticSearchInit call for 816 was not successful:{"wbstackElasticSearchInit":{"script":"extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php","return":1,"output":["Updating cluster default...","indexing namespaces...","\tIndexing namespaces...done","content index...","\tFetching Elasticsearch version...7.10.2...ok","\tScanning available plugins...","\t\texperimental-highlighter, extra","\tPicking analyzer...english","\tInferring index identifier...mwdb_6c0c916014_content_first","\tCreating index..."]}}: at .App\Jobs\CirrusSearch\ElasticSearchIndexInit->handleResponse ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:107 ) at .App\Jobs\CirrusSearch\CirrusSearchJob->handle ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:84 ) at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 ) at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 ) at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 ) at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 ) at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 ) at .Illuminate\Bus\Dispatcher->Illuminate\Bus\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:128 ) at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 ) at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 ) at .Illuminate\Bus\Dispatcher->dispatchNow ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:132 ) at .Illuminate\Queue\CallQueuedHandler->Illuminate\Queue\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:120 ) at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 ) at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 ) at .Illuminate\Queue\CallQueuedHandler->dispatchThroughMiddleware ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:122 ) at .Illuminate\Queue\CallQueuedHandler->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:70 ) at .Illuminate\Queue\Jobs\Job->fire ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php:98 ) at .Illuminate\Queue\Worker->process ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:428 ) at .Illuminate\Queue\Worker->runJob ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:378 ) at .Illuminate\Queue\Worker->daemon ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:172 ) at .Illuminate\Queue\Console\WorkCommand->runWorker ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:117 ) at .Illuminate\Queue\Console\WorkCommand->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:101 ) at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 ) at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 ) at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 ) at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 ) at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 ) at .Illuminate\Console\Command->execute ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:136 ) at .Symfony\Component\Console\Command\Command->run ( /var/www/html/vendor/symfony/console/Command/Command.php:298 ) at .Illuminate\Console\Command->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:121 ) at .Symfony\Component\Console\Application->doRunCommand ( /var/www/html/vendor/symfony/console/Application.php:1040 ) at .Symfony\Component\Console\Application->doRun ( /var/www/html/vendor/symfony/console/Application.php:301 ) at .Symfony\Component\Console\Application->run ( /var/www/html/vendor/symfony/console/Application.php:171 ) at .Illuminate\Console\Application->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Application.php:94 ) at .Illuminate\Foundation\Console\Kernel->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:129 ) at .{main} ( /var/www/html/artisan:37 ) at .App\Providers\AppServiceProvider->App\Providers\{closure} at .Illuminate\Events\Dispatcher->Illuminate\Events\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:404 ) at .Illuminate\Events\Dispatcher->dispatch ( /var/www/html/vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:249 ) at .Illuminate\Queue\Jobs\Job->fail ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php:195 ) at .App\Jobs\Job->fail ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/InteractsWithQueue.php:47 ) at .App\Jobs\CirrusSearch\CirrusSearchJob->validateSuccess ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:107 ) at .App\Jobs\CirrusSearch\ElasticSearchIndexInit->handleResponse ( /var/www/html/app/Jobs/CirrusSearch/ElasticSearchIndexInit.php:26 ) at .App\Jobs\CirrusSearch\CirrusSearchJob->handle ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:84 ) at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 ) at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 ) at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 ) at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 ) at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 ) at .Illuminate\Bus\Dispatcher->Illuminate\Bus\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:128 ) at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 ) at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 ) at .Illuminate\Bus\Dispatcher->dispatchNow ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:132 ) at .Illuminate\Queue\CallQueuedHandler->Illuminate\Queue\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:120 ) at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 ) at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 ) at .Illuminate\Queue\CallQueuedHandler->dispatchThroughMiddleware ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:122 ) at .Illuminate\Queue\CallQueuedHandler->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:70 ) at .Illuminate\Queue\Jobs\Job->fire ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php:98 ) at .Illuminate\Queue\Worker->process ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:428 ) at .Illuminate\Queue\Worker->runJob ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:378 ) at .Illuminate\Queue\Worker->daemon ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:172 ) at .Illuminate\Queue\Console\WorkCommand->runWorker ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:117 ) at .Illuminate\Queue\Console\WorkCommand->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:101 ) at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 ) at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 ) at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 ) at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 ) at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 ) at .Illuminate\Console\Command->execute ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:136 ) at .Symfony\Component\Console\Command\Command->run ( /var/www/html/vendor/symfony/console/Command/Command.php:298 ) at .Illuminate\Console\Command->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:121 ) at .Symfony\Component\Console\Application->doRunCommand ( /var/www/html/vendor/symfony/console/Application.php:1040 ) at .Symfony\Component\Console\Application->doRun ( /var/www/html/vendor/symfony/console/Application.php:301 ) at .Symfony\Component\Console\Application->run ( /var/www/html/vendor/symfony/console/Application.php:171 ) at .Illuminate\Console\Application->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Application.php:94 ) at .Illuminate\Foundation\Console\Kernel->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:129 ) at .{main} ( /var/www/html/artisan:37 )
The first meaningful errors seen in the elasticsearch logs were:
on the second master elasticsearch-1-master-1 there is flurry off slow cluster state computing and the corresponding flapping in health.
[2023-12-24T06:37:10,628][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_content_first] creating index, cause [api], templates [mwdb], shards [1]/[1] [0/1994] [2023-12-24T06:37:10,679][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.4s], which is over [10s], to compute cluster state update for [create-index [mwdb_e94e6a3fa5_content_first], cause [api]] [2023-12-24T06:37:25,056][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_content_first/7q_O3tHqTsOSEUpMKUA9ag] create_mapping [_doc] [2023-12-24T06:37:25,202][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_e94e6a3fa5_content_first][0]]]). [2023-12-24T06:37:45,299][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_general_first] creating index, cause [api], templates [mwdb], shards [1]/[1] [2023-12-24T06:37:45,341][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.2s], which is over [10s], to compute cluster state update for [create-index [mwdb_e94e6a3fa5_general_first], cause [api]] [2023-12-24T06:37:59,459][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_general_first/8Knb2WMMRkyUd__AxQ6llA] create_mapping [_doc] [2023-12-24T06:37:59,604][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_e94e6a3fa5_general_first][0]]]). [2023-12-24T23:34:20,031][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_content_first] creating index, cause [api], templates [mwdb], shards [1]/[1] [2023-12-24T23:34:20,082][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.8s], which is over [10s], to compute cluster state update for [create-index [mwdb_cf06321a6a_content_first], cause [api]] [2023-12-24T23:34:37,149][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_content_first/zEVVMI2ZQoC9j5DpUzXwLg] create_mapping [_doc] [2023-12-24T23:34:37,294][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_cf06321a6a_content_first][0]]]). [2023-12-24T23:34:57,862][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_general_first] creating index, cause [api], templates [mwdb], shards [1]/[1] [2023-12-24T23:34:57,905][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.6s], which is over [10s], to compute cluster state update for [create-index [mwdb_cf06321a6a_general_first], cause [api]] [2023-12-24T23:35:12,614][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_general_first/ScAe44OHTjqT3kzJoTwvRQ] create_mapping [_doc] [2023-12-24T23:35:12,766][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_cf06321a6a_general_first][0]]]). [2023-12-25T22:22:15,885][WARN ][o.e.t.InboundHandler ] [elasticsearch-1-master-1] handling inbound transport message [InboundMessage{Header{268644}{7.10.2}{4883636}{true}{false}{false}{false}{indices:admin/create}}] took [6803ms] which is above the warn threshold of [5000ms] [2023-12-26T11:04:50,542][WARN ][o.e.t.InboundHandler ] [elasticsearch-1-master-1] handling inbound transport message [InboundMessage{Header{268644}{7.10.2}{5425478}{true}{false}{false}{false}{indices:admin/create}}] took [6603ms] which is above the warn threshold of [5000ms] [2023-12-26T17:06:54,513][WARN ][o.e.t.InboundHandler ] [elasticsearch-1-master-1] handling inbound transport message [InboundMessage{Header{268644}{7.10.2}{4954606}{true}{false}{false}{false}{indices:admin/create}}] took [6602ms] which is above the warn threshold of [5000ms]