Page MenuHomePhabricator

Write a script that creates the completion suggester index
Closed, ResolvedPublic


For a first test I think this script should work like inplace reindex and generates a set of suggester field:

  • exact
  • stop words
  • exact with geo context
  • stop words with geo context

Each suggest field will be stored in memory, according to Lucene developpers the FST generated is about 50% more the size of the compressed content. If we plan to store only the main namespace we can roughly estimate the size it will take in memory with the size of the enwiki-XXXXXXX-all-titles-in-ns0.gz in ( For english wikipedia this file is 62mb so it should take ~90mb in memory.
This estimation is confirmed by the tests done by Mike McCandless (see Performance & Benchmarks at the end of
Note that we will use payloads so the estimated size for 2.1 million titles in english wikipedia is about 160mb in RAM per field.

Event Timeline

dcausse claimed this task.
dcausse raised the priority of this task from to Needs Triage.
dcausse updated the task description. (Show Details)
dcausse added a subscriber: dcausse.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse set Security to None.

Change 230813 had a related patch set uploaded (by DCausse):
WIP: Add a maintenance script to build the completion suggester index

Deskana triaged this task as Medium priority.Aug 11 2015, 5:06 PM
Deskana added a subscriber: Deskana.

Change 230813 merged by jenkins-bot:
WIP: Add a maintenance script to build the completion suggester index

Smalyshev added a subscriber: Smalyshev.