Page MenuHomePhabricator

Evaluate impact of an increase in phrase suggester size
Closed, ResolvedPublic

Description

In the related epic we are considering how we might run the phrase suggester, which provides many of the did-you-mean query suggestions, against a larger corpus. Today, in the production deployment, we have only the title and redirects included in the phrase suggester language model.

The epic proposes a model that involves maintaining new per-language indices and will save resources compared to a more naive per-wiki implementation. The underlying premise is that we cannot simply expand the existing indices. Evaluate if that is the case.

Event Timeline

Two primary sets of tests were conducted: an index size analysis and a memory utilization stress test.

Index Size Analysis

To estimate the required disk space, the enwiki_content index was imported to relforge. These indexes have no deleted documents and do not accurately recreate production sizes, but the relative size differences should still be similar. The import was repeated three times with different configurations:

  • Production (Baseline): The current production index configuration.
  • Configuration A (Suggest w/Opening Text): This configuration removes the title/redirect fields from the suggest index and replaces them with opening_text. This represents the long-term target configuration.
  • Configuration B (Extra Suggest Field): This configuration adds a new opening_text.suggest field while retaining the existing suggest field. This approach would likely be necessary for an initial deployment both to run an AB test, and to maintain compatibility with existing LTR models.

The total size of the resulting index for each configuration was recorded and compared.

Memory Impact Assessment

A memory stress test was performed on six instances within the production codfw cluster to determine the available memory headroom. The test was based on the theory that the Linux disk cache is critical for maintaining low search latency and that a reduction in available cache would lead to increased disk I/O and potential performance degradation.

The test involved programmatically allocating and filling blocks of memory with random data, effectively reducing the amount of memory available for the operating system's disk cache. This was performed during a high-traffic period to simulate worst-case conditions. The test was conducted in two phases:

  • Phase 1: 10 GB, 20 GB, and 40 GB of memory were allocated on two hosts each for a duration of 30 minutes.
  • Phase 2: The test was repeated with higher memory allocations of 50 GB, 75 GB, and 100 GB.

During the tests, key performance metrics, including memory availability and disk I/O throughput, were monitored. Search latency (p95 for full_text and more_like searches) was also reviewed to identify any perceptible degradation.

Results
Index Size and Disk Space Requirements

The index size comparison yielded the following results:

ConfigurationIndex SizeIncrease from Production
Production (Baseline)245.2 GB-
A: Suggest w/Opening Text251.6 GB+6.4 GB (2.6%)
B: Extra Suggest Field280.2 GB+35.0 GB (14.2%)

Based on these findings, the projected increase in disk space is approximately 2.6% for the long-term solution (Configuration A) and 14.2% for the initial deployment (Configuration B).

Given that the production clusters currently hold approximately 17 TB of data (including replicas and deleted documents), the initial deployment of Configuration B is estimated to require an additional ~2.5 TB of disk space. The long-term implementation of Configuration A would subsequently reduce this requirement to approximately ~0.5 TB.

Memory Impact Analysis

The memory allocation tests produced the following observations:

  • 10 GB and 20 GB Allocation: No discernible impact on any performance metrics was observed, aside from the direct reduction in available memory.
  • 40 GB Allocation: A measurable but operationally insignificant increase in disk throughput was noted, rising to approximately 5 MB/s.
  • 50 GB and 75 GB Allocation: A more noticeable increase in disk I/O occurred, though sustained throughput remained below 10 MB/s after an initial spike.
  • 100 GB Allocation: Disk throughput reached a steady state of approximately 40 MB/s with spikes to 80+.

Crucially, across all test scenarios, no perceptible increase in p95 search latency was observed. While minor fluctuations may have occurred, they were indistinguishable from normal operational noise.

Conclusion and Recommendation

The test results indicate that the production search clusters have significant memory headroom. The allocation of up to 100 GB of memory per host for other purposes resulted in a manageable increase in disk I/O (to ~30 MB/s), a level well below the previously observed performance inflection point of over 100 MB/s on older hardware.

A conservative estimate based on these findings is that at least 50 GB of memory per host is readily available without impacting performance. Across the 50 nodes in each cluster, this equates to 2.5 TB of available memory that can be leveraged for disk caching.

The initial deployment, estimated to require an additional 2.5 TB of disk space, would therefore have its increased disk cache requirements fully met by the available memory headroom.

Recommendation: It is recommended to proceed with the proposed index changes. The system has sufficient disk space and, more critically, the memory capacity to handle the increased index size without a negative impact on performance.

I'm a little late to the review party, but this all seems reasonable to me. Nice analysis, Erik!