Page MenuHomePhabricator

Write a script that creates the completion suggester index
Closed, ResolvedPublic

Description

For a first test I think this script should work like inplace reindex and generates a set of suggester field:

  • exact
  • stop words
  • exact with geo context
  • stop words with geo context

Each suggest field will be stored in memory, according to Lucene developpers the FST generated is about 50% more the size of the compressed content. If we plan to store only the main namespace we can roughly estimate the size it will take in memory with the size of the enwiki-XXXXXXX-all-titles-in-ns0.gz in (https://dumps.wikimedia.org/). For english wikipedia this file is 62mb so it should take ~90mb in memory.
This estimation is confirmed by the tests done by Mike McCandless (see Performance & Benchmarks at the end of https://www.elastic.co/blog/you-complete-me).
Note that we will use payloads so the estimated size for 2.1 million titles in english wikipedia is about 160mb in RAM per field.

Event Timeline

dcausse created this task.Jul 17 2015, 7:54 AM
dcausse claimed this task.
dcausse raised the priority of this task from to Needs Triage.
dcausse updated the task description. (Show Details)
dcausse added a subscriber: dcausse.
Restricted Application added a project: Discovery. · View Herald TranscriptJul 17 2015, 7:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse updated the task description. (Show Details)Jul 17 2015, 10:08 AM
dcausse set Security to None.
dcausse updated the task description. (Show Details)Jul 17 2015, 12:20 PM

Change 230813 had a related patch set uploaded (by DCausse):
WIP: Add a maintenance script to build the completion suggester index

https://gerrit.wikimedia.org/r/230813

Deskana triaged this task as Normal priority.Aug 11 2015, 5:06 PM
Deskana added a subscriber: Deskana.

Change 230813 merged by jenkins-bot:
WIP: Add a maintenance script to build the completion suggester index

https://gerrit.wikimedia.org/r/230813

Smalyshev closed this task as Resolved.Aug 14 2015, 8:55 PM
Smalyshev added a subscriber: Smalyshev.