Page MenuHomePhabricator

CirrusSearch SearchAfter implementation may skip documents
Closed, ResolvedPublic2 Estimated Story Points

Description

Seen while building the completion index on cawiki:

Sep 5, 2025 @ 02:50:44.091cawiki New index seems too small compared to the previous index 1007609/890039 > 0.1 (old/new > threshold). Aborting. (Use --force to bypass)
[...]
Sep 5, 2025 @ 02:48:43.982cawiki 2025-09-05 04:48:43 Waiting to re-check counts...
Sep 5, 2025 @ 02:48:43.924cawiki 2025-09-05 04:48:12 Optimizing index...ok.
Sep 5, 2025 @ 02:48:12.790cawiki 2025-09-05 04:48:12 Bulk requests 298 (retried 0), 890040/890039/1/0/0 (sent/created/updated/noop/error)
Sep 5, 2025 @ 02:48:12.789cawiki 2025-09-05 04:48:12 Indexing from general index done.
Sep 5, 2025 @ 02:48:12.789cawiki 2025-09-05 04:48:12 Exported 681703 (781151 total hits) from the search indices and indexed 890040.
Sep 5, 2025 @ 02:48:12.634cawiki     100% done...
Sep 5, 2025 @ 02:48:12.478cawiki 2025-09-05 04:48:12 total hits: 703
Sep 5, 2025 @ 02:48:12.478cawiki 2025-09-05 04:48:12 Indexing 703 documents from general with batchId: 1757040156
Sep 5, 2025 @ 02:48:11.549cawiki 2025-09-05 04:48:11 Indexing from content index done.
Sep 5, 2025 @ 02:47:17.545cawiki     86% done... <= stops at 86%
Sep 5, 2025 @ 02:47:11.498cawiki     84% done...
[...]
Sep 5, 2025 @ 02:42:47.629cawiki     2% done...
Sep 5, 2025 @ 02:42:38.521cawiki 2025-09-05 04:42:38 total hits: 780448
Sep 5, 2025 @ 02:42:38.521cawiki 2025-09-05 04:42:38 Indexing 780448 documents from content with batchId: 1757040156

The index hopefully failed to promote but something is clearly stopping the SearchAfter loop before it has a chance to export the 780448 docs reported by total hits.

We know that Elastica can be a bit lenient when it comes to error handling.
Here we may suspect \Elastica\ResultSet\DefaultBuilder that does:

$results = [];

if (!isset($data['hits']['hits'])) {
    return $results;
}

More broadly there might be a problem as well in the Http transport class that does not appear to fail on http error code but rather trust the presence of a json body to contain a valid opensearch response mentioning an explicit error.
We might possibly solve this issue in SearchAfter by inspecting manually the response body for the presence of some search metadata like _shards that should always be set on valid search response.

AC:

  • CirrusSearch SearchAfter does not skip documents.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse renamed this task from CirrusSearch SearchAfter implement may skip documents to CirrusSearch SearchAfter implementation may skip documents.Sep 8 2025, 6:10 AM
pfischer set the point value for this task to 2.Sep 15 2025, 3:21 PM

Change #1189479 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] SearchAfter: handle failures properly

https://gerrit.wikimedia.org/r/1189479

Change #1189479 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] SearchAfter: handle failures properly

https://gerrit.wikimedia.org/r/1189479