A single bot running morelike queries on cebwiki caused the overall p95 morelike latencies to increase:
which then caused the alert CirrusSearchMoreLikeLatencyTooHigh to flap on&off.
It appears that morelike is particularly slow on this wiki (>1s). The reason might be related to the fact that the number of docs per shard is relatively high (~3.8mil/shard):
index shard prirep state docs store ip node cebwiki_content_1728036753 2 p STARTED 3873640 29.4gb 10.192.32.88 elastic2083-production-search-codfw cebwiki_content_1728036753 2 r STARTED 3873640 30.5gb 10.192.16.204 elastic2057-production-search-codfw cebwiki_content_1728036753 2 r STARTED 3873640 29.3gb 10.192.48.13 elastic2060-production-search-codfw cebwiki_content_1728036753 1 r STARTED 3872253 28.4gb 10.192.0.92 elastic2089-production-search-codfw cebwiki_content_1728036753 1 p STARTED 3872253 30.9gb 10.192.48.160 elastic2109-production-search-codfw cebwiki_content_1728036753 1 r STARTED 3872253 27.6gb 10.192.16.110 elastic2070-production-search-codfw cebwiki_content_1728036753 3 r STARTED 3871061 27.1gb 10.192.0.138 elastic2074-production-search-codfw cebwiki_content_1728036753 3 p STARTED 3871061 29gb 10.192.48.89 elastic2107-production-search-codfw cebwiki_content_1728036753 3 r STARTED 3871061 27.2gb 10.192.16.232 elastic2095-production-search-codfw cebwiki_content_1728036753 0 r STARTED 3871687 29.1gb 10.192.48.179 elastic2086-production-search-codfw cebwiki_content_1728036753 0 p STARTED 3871687 28.9gb 10.192.16.228 elastic2092-production-search-codfw cebwiki_content_1728036753 0 r STARTED 3871687 27.4gb 10.192.0.206 elastic2076-production-search-codfw
We should perhaps try to re-shard this wiki to bring this number down and assess if the response times for morelike on this wiki gets better.
AC:
- re-shard cebwiki_content to bring the number of docs per shard down (<2mil/shard), set the shard count to 8?
