Page MenuHomePhabricator

Elasticsearch errors about BulkShardRequest
Closed, DeclinedPublic

Description

elastic2014 seems to have a lot of errors related to BulkShardRequest (see example below).

[2017-06-06T03:08:04,546][DEBUG][org.elasticsearch.action.bulk.TransportShardBulkAction] [jawiki_content_1487427148][4] failed to execute bulk item (update) BulkShardRequest [[jawiki_content_1487427148][4
]] containing [134] requests
org.elasticsearch.index.engine.DocumentMissingException: [page][1615532]: document missing
        at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:92) ~[elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:81) ~[elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeUpdateRequest(TransportShardBulkAction.java:269) ~[elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:159) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:113) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:69) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:939) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:908) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:322) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:264) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:888) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:885) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1654) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:897) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.access$400(TransportReplicationAction.java:93) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:281) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:260) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:252) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:618) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.3.2.jar:5.3.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.2.jar:5.3.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The doc 1615532 is in the general index at ノート:速水太郎 but a page with the same title exists at 速水太郎.
Another example is page_id 12 from commons, this time I don't see a similar page in general.
Looking at the oozie job to populate pageviews I don't see anything related to filtering pages in the content namespace, I'd suspect that the error is not new but simply that we now log these DEBUG messages?

Mentioned in SAL (#wikimedia-operations) [2017-06-06T08:39:22Z] <gehel> raise log level to WARN for TransportShardBulkAction on elasticsearch cirrus - T167091

Change 357371 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch - raise logging of TransportShardBulkAction to WARN

https://gerrit.wikimedia.org/r/357371

Change 357371 merged by Gehel:
[operations/puppet@production] elasticsearch - raise logging of actions to INFO

https://gerrit.wikimedia.org/r/357371

debt triaged this task as Medium priority.Jun 8 2017, 5:07 PM
debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.
debt moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.

@dcausse, @EBernhardson: this error is now filtered in the logs. Do we want to address the root cause? Or is this just a side issue that is safe to ignore?

dcausse lowered the priority of this task from Medium to Low.Jun 21 2017, 2:36 PM
dcausse moved this task from Needs review to Incoming on the Discovery-Search (Current work) board.

This would be interesting to know why we get these errors but I don't think it's very urgent... I'm pretty sure that these errors are not new...
There is definitely something in our indexing pipeline that is sending invalid docs.
Lowering prio and moving to backlog, please change if you think it's important to address.

debt added a subscriber: debt.

Gotcha, thanks for taking a look, @dcausse, I'll move it to the backlog board for up next work (trying to keep our backlog column on the sprint board as stuff we need to tackle first).

Context lost from 3 years ago, closing it for now.