While evaluating the new larger ranking models ran into issue where queries run in 150ms on codfw and 1s on eqiad. Turns out to be because codfw was caching models, but eqiad was churning the model cache. It appears the default cache size of 10MB is too small and is churning models. Compiling models, especially large ones, can take a second or more and is not something we can have regularly happening. Resize the cache up to 100mb which is still a very small fraction of heap, but should be large enough to prevent churn.
Sadly this is not currently updatable via cluster settings api (task filed upstream), so we need to do a rolling restart.