We've been getting a lot of alerts for excessive garbage collection on the cloudelastic hosts.
Creating this ticket to address the issue. Possible approaches:
- Stop the GC from happening
- Detune the alerts
We've been getting a lot of alerts for excessive garbage collection on the cloudelastic hosts.
Creating this ticket to address the issue. Possible approaches:
Mentioned in SAL (#wikimedia-operations) [2024-09-03T16:01:58Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply heap settings - bking@cumin2002 - T373895
I merged this puppet patch to increase heap size for the secondary clusters on Cloudelastic.
Mentioned in SAL (#wikimedia-operations) [2024-09-03T16:26:30Z] <bking@cumin2002> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply heap settings - bking@cumin2002 - T373895
The alerts have cleared, but let's leave this open for a few days so we can get a better idea if the heap size increase helped.
It's been 12 days and I have not seen any new alerts for garbage collection in cloudelastic. As such, I'm moving to "needs review." If/when the Search Platform software engineers are happy, we can close this one completely.