Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
cloudelastic: bump smaller cluster heap from 10 to 12G | operations/puppet | production | +2 -2 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Gehel | T319020 Reset to upstream java GC options and remove redundant JVM options | |||
Resolved | Gehel | T323646 Observe results from JVM options/heap memory changes |
Event Timeline
We changed the java GC options above to reduce old GC alerts, but we had another one today for cloudelastic:
[12:25:01] <+jinxer-wm> (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
We had a few more alerts over for excessive old GC time last week for cloudelastic, but now we see one for production:
(CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance elastic1089-production-search-omega-eqiad is running the old gc excessively
Change 881474 had a related patch set uploaded (by Bking; author: Bking):
[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G
Change 881474 merged by Bking:
[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G
Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:03:32Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:35:22Z] <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646