Increase ltr.cache.max_size in Cirrus elasticsearch clusters
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	EBernhardson
	Feb 22 2018, 5:09 PM

Description

While evaluating the new larger ranking models ran into issue where queries run in 150ms on codfw and 1s on eqiad. Turns out to be because codfw was caching models, but eqiad was churning the model cache. It appears the default cache size of 10MB is too small and is churning models. Compiling models, especially large ones, can take a second or more and is not something we can have regularly happening. Resize the cache up to 100mb which is still a very small fraction of heap, but should be large enough to prevent churn.

Sadly this is not currently updatable via cluster settings api (task filed upstream), so we need to do a rolling restart.

Details

	Subject	Repo	Branch	Lines +/-
	Resize the Cirrus LTR model cache	operations/puppet	production	+2 -0
	Resize the Cirrus LTR model cache	operations/puppet	production	+18 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T174064 [FY 2017-18 Objective] Implement advanced search methodologies
Resolved	EBernhardson	T161632 [Epic] Improve search by researching and deploying machine learning to re-rank search results
Resolved	EBernhardson	T162279 Collect ideas for feature engineering of LTRank
Resolved	EBernhardson	T187148 Evaluate features provided by `query_explorer` functionality of ltr plugin
Resolved	EBernhardson	T188015 Increase ltr.cache.max_size in Cirrus elasticsearch clusters

Event Timeline

EBernhardson triaged this task as Medium priority.Feb 22 2018, 5:09 PM

EBernhardson created this task.

Change 413407 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/puppet@production] Resize the Cirrus LTR model cache

https://gerrit.wikimedia.org/r/413407

gerritbot added a project: Patch-For-Review.Feb 22 2018, 5:10 PM

EBernhardson moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.Feb 22 2018, 5:12 PM

EBernhardson updated the task description. (Show Details)

Change 413407 merged by Gehel:
[operations/puppet@production] Resize the Cirrus LTR model cache

https://gerrit.wikimedia.org/r/413407

Change 414637 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] Resize the Cirrus LTR model cache

https://gerrit.wikimedia.org/r/414637

Change 414637 merged by Gehel:
[operations/puppet@production] Resize the Cirrus LTR model cache

https://gerrit.wikimedia.org/r/414637

EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Mar 1 2018, 10:34 PM

Tested again today after restarts were completed. After an initial warmup to get the models into the caches I am now seeing consistent performance for both clusters in an acceptable range.

Seems to be working as expected, closing this one.

Increase ltr.cache.max_size in Cirrus elasticsearch clustersClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Increase ltr.cache.max_size in Cirrus elasticsearch clusters
Closed, ResolvedPublic
Actions

Related Objects
Search...