Observe results from JVM options/heap memory changes
Closed, ResolvedPublic2 Estimated Story Points
Actions

Assigned To

Authored By

	bking
	Nov 22 2022, 8:33 PM

Description

Per T319020 and T323612 , we made some JVM-related config changes to Elastic config (justification in the respective tickets). Opening this ticket to:

Observe performance for the next 2 weeks
Based on observations, decide whether to keep or roll back the changes.

Details

	Subject	Repo	Branch	Lines +/-
	cloudelastic: bump smaller cluster heap from 10 to 12G	operations/puppet	production	+2 -2

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Gehel	T319020 Reset to upstream java GC options and remove redundant JVM options
		Resolved		Gehel	T323646 Observe results from JVM options/heap memory changes

Event Timeline

bking created this task.Nov 22 2022, 8:33 PM

MPhamWMF set the point value for this task to 2.Nov 28 2022, 4:48 PM

MPhamWMF moved this task from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.

We changed the java GC options above to reduce old GC alerts, but we had another one today for cloudelastic:

[12:25:01]  <+jinxer-wm> (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency

Link to Grafana stats

We had a few more alerts over for excessive old GC time last week for cloudelastic, but now we see one for production:

(CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance elastic1089-production-search-omega-eqiad is running the old gc excessively

Grafana link

bking mentioned this in T324500: Improve monitoring knowledge for Elasticsearch garbage collection.Dec 5 2022, 6:50 PM

Gehel moved this task from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board.Dec 19 2022, 4:30 PM

Gehel closed this task as Resolved.Jan 13 2023, 9:55 AM

Gehel claimed this task.

Change 881474 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G

https://gerrit.wikimedia.org/r/881474

gerritbot added a project: Patch-For-Review.Jan 18 2023, 9:14 PM

Change 881474 merged by Bking:

[operations/puppet@production] cloudelastic: bump smaller cluster heap from 10 to 12G

https://gerrit.wikimedia.org/r/881474

Maintenance_bot removed a project: Patch-For-Review.Jan 18 2023, 9:30 PM

Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:03:32Z] <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646

Mentioned in SAL (#wikimedia-operations) [2023-01-18T22:35:22Z] <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646

Observe results from JVM options/heap memory changesClosed, ResolvedPublic2 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Observe results from JVM options/heap memory changes
Closed, ResolvedPublic2 Estimated Story Points
Actions

Related Objects
Search...