Seen while building the completion index on cawiki:
Sep 5, 2025 @ 02:50:44.091cawiki New index seems too small compared to the previous index 1007609/890039 > 0.1 (old/new > threshold). Aborting. (Use --force to bypass) [...] Sep 5, 2025 @ 02:48:43.982cawiki 2025-09-05 04:48:43 Waiting to re-check counts... Sep 5, 2025 @ 02:48:43.924cawiki 2025-09-05 04:48:12 Optimizing index...ok. Sep 5, 2025 @ 02:48:12.790cawiki 2025-09-05 04:48:12 Bulk requests 298 (retried 0), 890040/890039/1/0/0 (sent/created/updated/noop/error) Sep 5, 2025 @ 02:48:12.789cawiki 2025-09-05 04:48:12 Indexing from general index done. Sep 5, 2025 @ 02:48:12.789cawiki 2025-09-05 04:48:12 Exported 681703 (781151 total hits) from the search indices and indexed 890040. Sep 5, 2025 @ 02:48:12.634cawiki 100% done... Sep 5, 2025 @ 02:48:12.478cawiki 2025-09-05 04:48:12 total hits: 703 Sep 5, 2025 @ 02:48:12.478cawiki 2025-09-05 04:48:12 Indexing 703 documents from general with batchId: 1757040156 Sep 5, 2025 @ 02:48:11.549cawiki 2025-09-05 04:48:11 Indexing from content index done. Sep 5, 2025 @ 02:47:17.545cawiki 86% done... <= stops at 86% Sep 5, 2025 @ 02:47:11.498cawiki 84% done... [...] Sep 5, 2025 @ 02:42:47.629cawiki 2% done... Sep 5, 2025 @ 02:42:38.521cawiki 2025-09-05 04:42:38 total hits: 780448 Sep 5, 2025 @ 02:42:38.521cawiki 2025-09-05 04:42:38 Indexing 780448 documents from content with batchId: 1757040156
The index hopefully failed to promote but something is clearly stopping the SearchAfter loop before it has a chance to export the 780448 docs reported by total hits.
We know that Elastica can be a bit lenient when it comes to error handling.
Here we may suspect \Elastica\ResultSet\DefaultBuilder that does:
$results = [];
if (!isset($data['hits']['hits'])) {
return $results;
}More broadly there might be a problem as well in the Http transport class that does not appear to fail on http error code but rather trust the presence of a json body to contain a valid opensearch response mentioning an explicit error.
We might possibly solve this issue in SearchAfter by inspecting manually the response body for the presence of some search metadata like _shards that should always be set on valid search response.
AC:
- CirrusSearch SearchAfter does not skip documents.