On 2019-12-17 13:44 UTC both server load
and QPS dropped on this set of servers and has stayed low for the following month.
per-server QPS (click "show all 49" to see problem): https://grafana.wikimedia.org/explore?orgId=1&left=%5B%221576589990313%22,%221576590608996%22,%22eqiad%20prometheus%2Fops%22,%7B%22expr%22:%22sum(clamp_min(deriv(elasticsearch_indices_search_query_total%7Bexported_cluster%3D%5C%22production-search-eqiad%5C%22%7D%5B2m%5D),%200))%20by%20(instance)%22,%22context%22:%22explore%22%7D,%7B%22mode%22:%22Metrics%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D
Above graph can also be zoomed out to 30 days to see that QPS dropped and stayed low.
This is suspiciously well correlated with some SAL entries:
13:53 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
13:51 gehel@cumin1001: START - Cookbook sre.hosts.decommission
13:49 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
13:48 gehel@cumin1001: START - Cookbook sre.hosts.decommission
@Gehel Any idea what might have happened here?