Page MenuHomePhabricator

Investigate usage of elasticsearch optimize API
Open, Needs TriagePublic

Description

Elasticsearch has an api for forcing segments to be joined, optionally limiting it only to segments that contain deleted documents.

Across the cluster we have a few indexes with quite a few deleted documents. zhwiki_content has 2.8M documents and 1.6M deleted documents. fawiki_content has 1.3M documents and 683k deleted documents. enwiki_general, our second largest index, has 24M documents and 9.4M deleted documents. Cluster wide (excluding apifeatureusage) there are 2.3B documents and 541M deleted documents.

Just about everything from query latency and accuracy to shuffling shards around the cluster will be more efficient if they are going through a smaller fraction of deleted documents.

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a project: CirrusSearch.
Restricted Application added a project: Discovery. · View Herald TranscriptAug 27 2015, 4:45 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana moved this task from Uncategorised to Technical on the CirrusSearch board.Dec 31 2015, 5:01 AM
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptDec 31 2015, 5:01 AM