For a first test I think this script should work like inplace reindex and generates a set of suggester field:
- stop words
- exact with geo context
- stop words with geo context
Each suggest field will be stored in memory, according to Lucene developpers the FST generated is about 50% more the size of the compressed content. If we plan to store only the main namespace we can roughly estimate the size it will take in memory with the size of the enwiki-XXXXXXX-all-titles-in-ns0.gz in (https://dumps.wikimedia.org/). For english wikipedia this file is 62mb so it should take ~90mb in memory.
This estimation is confirmed by the tests done by Mike McCandless (see Performance & Benchmarks at the end of https://www.elastic.co/blog/you-complete-me).
Note that we will use payloads so the estimated size for 2.1 million titles in english wikipedia is about 160mb in RAM per field.