We have had an elasticsearch issue causing problems with creating search indices for new wikis and also with searching on existing wikis since around 2023-12-25.
impact
Since 2023-12-25 newly created Wikibases will have had failing search.
Since 2023-12-27 existing Wikibases may have slow, unreliable or outdated search results
timeline
2023-12-25
first wbstackElasticSearchInit error occurs
2023-12-26
more instances occur
2023-12-27
team investigates the situation
Prometheus metric exporter is deployed
Shard limit per-node is increased to 850 from 800
2023-12-28
Attempt is made to restart failing master node
It's observed that all master nodes are now logging errors relating to slow garbage collection.
2024-01-08
Memory of masters is increased from 8->10GB and heap from 4-5GB PR
Memory is further increased to be 25% of the datanode memory PR
details
The initially noticed error
RuntimeException: wbstackElasticSearchInit call for 816 was not successful:{"wbstackElasticSearchInit":{"script":"extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php","return":1,"output":["Updating cluster default...","indexing namespaces...","\tIndexing namespaces...done","content index...","\tFetching Elasticsearch version...7.10.2...ok","\tScanning available plugins...","\t\texperimental-highlighter, extra","\tPicking analyzer...english","\tInferring index identifier...mwdb_6c0c916014_content_first","\tCreating index..."]}}:
at .App\Jobs\CirrusSearch\ElasticSearchIndexInit->handleResponse ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:107 )
at .App\Jobs\CirrusSearch\CirrusSearchJob->handle ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:84 )
at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 )
at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 )
at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 )
at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 )
at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 )
at .Illuminate\Bus\Dispatcher->Illuminate\Bus\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:128 )
at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 )
at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 )
at .Illuminate\Bus\Dispatcher->dispatchNow ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:132 )
at .Illuminate\Queue\CallQueuedHandler->Illuminate\Queue\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:120 )
at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 )
at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 )
at .Illuminate\Queue\CallQueuedHandler->dispatchThroughMiddleware ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:122 )
at .Illuminate\Queue\CallQueuedHandler->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:70 )
at .Illuminate\Queue\Jobs\Job->fire ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php:98 )
at .Illuminate\Queue\Worker->process ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:428 )
at .Illuminate\Queue\Worker->runJob ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:378 )
at .Illuminate\Queue\Worker->daemon ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:172 )
at .Illuminate\Queue\Console\WorkCommand->runWorker ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:117 )
at .Illuminate\Queue\Console\WorkCommand->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:101 )
at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 )
at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 )
at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 )
at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 )
at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 )
at .Illuminate\Console\Command->execute ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:136 )
at .Symfony\Component\Console\Command\Command->run ( /var/www/html/vendor/symfony/console/Command/Command.php:298 )
at .Illuminate\Console\Command->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:121 )
at .Symfony\Component\Console\Application->doRunCommand ( /var/www/html/vendor/symfony/console/Application.php:1040 )
at .Symfony\Component\Console\Application->doRun ( /var/www/html/vendor/symfony/console/Application.php:301 )
at .Symfony\Component\Console\Application->run ( /var/www/html/vendor/symfony/console/Application.php:171 )
at .Illuminate\Console\Application->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Application.php:94 )
at .Illuminate\Foundation\Console\Kernel->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:129 )
at .{main} ( /var/www/html/artisan:37 )
at .App\Providers\AppServiceProvider->App\Providers\{closure}
at .Illuminate\Events\Dispatcher->Illuminate\Events\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:404 )
at .Illuminate\Events\Dispatcher->dispatch ( /var/www/html/vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:249 )
at .Illuminate\Queue\Jobs\Job->fail ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php:195 )
at .App\Jobs\Job->fail ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/InteractsWithQueue.php:47 )
at .App\Jobs\CirrusSearch\CirrusSearchJob->validateSuccess ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:107 )
at .App\Jobs\CirrusSearch\ElasticSearchIndexInit->handleResponse ( /var/www/html/app/Jobs/CirrusSearch/ElasticSearchIndexInit.php:26 )
at .App\Jobs\CirrusSearch\CirrusSearchJob->handle ( /var/www/html/app/Jobs/CirrusSearch/CirrusSearchJob.php:84 )
at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 )
at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 )
at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 )
at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 )
at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 )
at .Illuminate\Bus\Dispatcher->Illuminate\Bus\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:128 )
at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 )
at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 )
at .Illuminate\Bus\Dispatcher->dispatchNow ( /var/www/html/vendor/laravel/framework/src/Illuminate/Bus/Dispatcher.php:132 )
at .Illuminate\Queue\CallQueuedHandler->Illuminate\Queue\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:120 )
at .Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:128 )
at .Illuminate\Pipeline\Pipeline->then ( /var/www/html/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php:103 )
at .Illuminate\Queue\CallQueuedHandler->dispatchThroughMiddleware ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:122 )
at .Illuminate\Queue\CallQueuedHandler->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/CallQueuedHandler.php:70 )
at .Illuminate\Queue\Jobs\Job->fire ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php:98 )
at .Illuminate\Queue\Worker->process ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:428 )
at .Illuminate\Queue\Worker->runJob ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:378 )
at .Illuminate\Queue\Worker->daemon ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:172 )
at .Illuminate\Queue\Console\WorkCommand->runWorker ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:117 )
at .Illuminate\Queue\Console\WorkCommand->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:101 )
at .Illuminate\Container\BoundMethod::Illuminate\Container\{closure} ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36 )
at .Illuminate\Container\Util::unwrapIfClosure ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Util.php:40 )
at .Illuminate\Container\BoundMethod::callBoundMethod ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:93 )
at .Illuminate\Container\BoundMethod::call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:37 )
at .Illuminate\Container\Container->call ( /var/www/html/vendor/laravel/framework/src/Illuminate/Container/Container.php:653 )
at .Illuminate\Console\Command->execute ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:136 )
at .Symfony\Component\Console\Command\Command->run ( /var/www/html/vendor/symfony/console/Command/Command.php:298 )
at .Illuminate\Console\Command->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Command.php:121 )
at .Symfony\Component\Console\Application->doRunCommand ( /var/www/html/vendor/symfony/console/Application.php:1040 )
at .Symfony\Component\Console\Application->doRun ( /var/www/html/vendor/symfony/console/Application.php:301 )
at .Symfony\Component\Console\Application->run ( /var/www/html/vendor/symfony/console/Application.php:171 )
at .Illuminate\Console\Application->run ( /var/www/html/vendor/laravel/framework/src/Illuminate/Console/Application.php:94 )
at .Illuminate\Foundation\Console\Kernel->handle ( /var/www/html/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:129 )
at .{main} ( /var/www/html/artisan:37 )The first meaningful errors seen in the elasticsearch logs were:
on the second master elasticsearch-1-master-1 there is flurry off slow cluster state computing and the corresponding flapping in health.
[2023-12-24T06:37:10,628][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_content_first] creating index, cause [api], templates [mwdb], shards [1]/[1] [0/1994]
[2023-12-24T06:37:10,679][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.4s], which is over [10s], to compute cluster state update for [create-index [mwdb_e94e6a3fa5_content_first], cause [api]]
[2023-12-24T06:37:25,056][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_content_first/7q_O3tHqTsOSEUpMKUA9ag] create_mapping [_doc]
[2023-12-24T06:37:25,202][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_e94e6a3fa5_content_first][0]]]).
[2023-12-24T06:37:45,299][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_general_first] creating index, cause [api], templates [mwdb], shards [1]/[1]
[2023-12-24T06:37:45,341][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.2s], which is over [10s], to compute cluster state update for [create-index [mwdb_e94e6a3fa5_general_first], cause [api]]
[2023-12-24T06:37:59,459][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_e94e6a3fa5_general_first/8Knb2WMMRkyUd__AxQ6llA] create_mapping [_doc]
[2023-12-24T06:37:59,604][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_e94e6a3fa5_general_first][0]]]).
[2023-12-24T23:34:20,031][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_content_first] creating index, cause [api], templates [mwdb], shards [1]/[1]
[2023-12-24T23:34:20,082][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.8s], which is over [10s], to compute cluster state update for [create-index [mwdb_cf06321a6a_content_first], cause [api]]
[2023-12-24T23:34:37,149][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_content_first/zEVVMI2ZQoC9j5DpUzXwLg] create_mapping [_doc]
[2023-12-24T23:34:37,294][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_cf06321a6a_content_first][0]]]).
[2023-12-24T23:34:57,862][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_general_first] creating index, cause [api], templates [mwdb], shards [1]/[1]
[2023-12-24T23:34:57,905][WARN ][o.e.c.s.MasterService ] [elasticsearch-1-master-1] took [13.6s], which is over [10s], to compute cluster state update for [create-index [mwdb_cf06321a6a_general_first], cause [api]]
[2023-12-24T23:35:12,614][INFO ][o.e.c.m.MetadataMappingService] [elasticsearch-1-master-1] [mwdb_cf06321a6a_general_first/ScAe44OHTjqT3kzJoTwvRQ] create_mapping [_doc]
[2023-12-24T23:35:12,766][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-1-master-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[mwdb_cf06321a6a_general_first][0]]]).
[2023-12-25T22:22:15,885][WARN ][o.e.t.InboundHandler ] [elasticsearch-1-master-1] handling inbound transport message [InboundMessage{Header{268644}{7.10.2}{4883636}{true}{false}{false}{false}{indices:admin/create}}] took [6803ms]
which is above the warn threshold of [5000ms]
[2023-12-26T11:04:50,542][WARN ][o.e.t.InboundHandler ] [elasticsearch-1-master-1] handling inbound transport message [InboundMessage{Header{268644}{7.10.2}{5425478}{true}{false}{false}{false}{indices:admin/create}}] took [6603ms]
which is above the warn threshold of [5000ms]
[2023-12-26T17:06:54,513][WARN ][o.e.t.InboundHandler ] [elasticsearch-1-master-1] handling inbound transport message [InboundMessage{Header{268644}{7.10.2}{4954606}{true}{false}{false}{false}{indices:admin/create}}] took [6602ms]
which is above the warn threshold of [5000ms]