Page MenuHomePhabricator

Investigate usage of elasticsearch optimize API
Closed, DeclinedPublic

Description

Elasticsearch has an api for forcing segments to be joined, optionally limiting it only to segments that contain deleted documents.

Across the cluster we have a few indexes with quite a few deleted documents. zhwiki_content has 2.8M documents and 1.6M deleted documents. fawiki_content has 1.3M documents and 683k deleted documents. enwiki_general, our second largest index, has 24M documents and 9.4M deleted documents. Cluster wide (excluding apifeatureusage) there are 2.3B documents and 541M deleted documents.

Just about everything from query latency and accuracy to shuffling shards around the cluster will be more efficient if they are going through a smaller fraction of deleted documents.

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a project: CirrusSearch.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Aklapper removed a project: Discovery-ARCHIVED.
Restricted Application added subscribers: Huji, Stang. · View Herald Transcript
Gehel subscribed.

This does not seem to be an issue at the moment.