Page MenuHomePhabricator

Investigate perf regression after elasticsearch 5.3.2 deployment
Closed, ResolvedPublic


Elasticsearch 5.3.2 seems to have caused a visible perf regression on query percentiles:

  • fulltext: +20ms
  • morelike: +40ms
  • compsuggest: +2ms

Young GC activity seems to have jumped as well while the amount of heap used seems to have decreased:

Event Timeline

dcausse created this task.Jun 12 2017, 8:34 AM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptJun 12 2017, 8:34 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel reopened this task as Open.Jun 12 2017, 2:56 PM

Change 358383 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: remove UseConcMarkSweepGC

Gehel triaged this task as High priority.Jun 12 2017, 2:57 PM

Mentioned in SAL (#wikimedia-operations) [2017-06-13T14:22:28Z] <gehel> restarting elasticsearch on relforge to validate GC configuration - T167636

Mentioned in SAL (#wikimedia-operations) [2017-06-13T15:09:12Z] <gehel> applying new GC configuration on elastic1018 - T167636

Gehel added a comment.Jun 14 2017, 4:15 PM

elastic1018 is looking good, with significantly lower GC times than other nodes (see grafana). Next test is to roll out to the whole cluster...

Change 358383 merged by Gehel:
[operations/puppet@production] elasticsearch: remove UseConcMarkSweepGC

Change is merged. It will require a full cluster restart to be taken into account before we can actually close it. Cluster restart is planned to start on Monday June 19th.

debt closed this task as Resolved.Jun 16 2017, 5:32 PM
debt claimed this task.
debt added a subscriber: debt.

Resolving, final restart of the clusters will happen on Monday, June 19, 2017

debt removed debt as the assignee of this task.Jun 19 2017, 6:22 PM