Maniphest T110547

Investigate usage of elasticsearch optimize API
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	EBernhardson
	Aug 27 2015, 4:45 PM

Tags

Referenced Files

None

Subscribers

Description

Elasticsearch has an api for forcing segments to be joined, optionally limiting it only to segments that contain deleted documents.

Across the cluster we have a few indexes with quite a few deleted documents. zhwiki_content has 2.8M documents and 1.6M deleted documents. fawiki_content has 1.3M documents and 683k deleted documents. enwiki_general, our second largest index, has 24M documents and 9.4M deleted documents. Cluster wide (excluding apifeatureusage) there are 2.3B documents and 541M deleted documents.

Just about everything from query latency and accuracy to shuffling shards around the cluster will be more efficient if they are going through a smaller fraction of deleted documents.

Event Timeline

EBernhardson created this task.Aug 27 2015, 4:45 PM

EBernhardson raised the priority of this task from to Needs Triage.

EBernhardson updated the task description. (Show Details)

EBernhardson added a project: CirrusSearch.

EBernhardson added subscribers: EBernhardson, dcausse, • chasemp.

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptAug 27 2015, 4:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

• Deskana moved this task from Inbox to Technical on the CirrusSearch board.Dec 31 2015, 5:01 AM

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptDec 31 2015, 5:01 AM

Aklapper triaged this task as Low priority.Oct 9 2023, 12:28 AM

Aklapper removed a project: Discovery-ARCHIVED.

Restricted Application added a project: Discovery-Search. · View Herald TranscriptOct 9 2023, 12:28 AM

Restricted Application added subscribers: Huji, Stang. · View Herald Transcript

This does not seem to be an issue at the moment.

Stang unsubscribed.Oct 23 2023, 10:33 PM