Page MenuHomePhabricator

Observe results from JVM options/heap memory changes
Closed, ResolvedPublic2 Estimated Story Points

Description

Per T319020 and T323612 , we made some JVM-related config changes to Elastic config (justification in the respective tickets). Opening this ticket to:

  • Observe performance for the next 2 weeks
  • Based on observations, decide whether to keep or roll back the changes.

Event Timeline

We changed the java GC options above to reduce old GC alerts, but we had another one today for cloudelastic:

[12:25:01]  <+jinxer-wm> (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency

Link to Grafana stats

We had a few more alerts over for excessive old GC time last week for cloudelastic, but now we see one for production:

(CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance elastic1089-production-search-omega-eqiad is running the old gc excessively

Grafana link

Gehel claimed this task.

Change 881474 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G

https://gerrit.wikimedia.org/r/881474

Change 881474 merged by Bking:

[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G

https://gerrit.wikimedia.org/r/881474

Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:03:32Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646

Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:35:22Z] <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646