Page MenuHomePhabricator

Observe results from JVM options/heap memory changes
Closed, ResolvedPublic2 Estimated Story Points

Description

Per T319020 and T323612 , we made some JVM-related config changes to Elastic config (justification in the respective tickets). Opening this ticket to:

  • Observe performance for the next 2 weeks
  • Based on observations, decide whether to keep or roll back the changes.

Event Timeline

MPhamWMF set the point value for this task to 2.Nov 28 2022, 4:48 PM

We changed the java GC options above to reduce old GC alerts, but we had another one today for cloudelastic:

[12:25:01]  <+jinxer-wm> (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency

Link to Grafana stats

We had a few more alerts over for excessive old GC time last week for cloudelastic, but now we see one for production:

(CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance elastic1089-production-search-omega-eqiad is running the old gc excessively

Grafana link

Gehel claimed this task.

Change 881474 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G

https://gerrit.wikimedia.org/r/881474

Change 881474 merged by Bking:

[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G

https://gerrit.wikimedia.org/r/881474

Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:03:32Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646

Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:35:22Z] <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646