Maniphest T194013

Possible deadlock in the elastic cache used by the ltr plugin
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	dcausse
	May 7 2018, 9:47 AM

Description

I was tracking this directly on github but after further investigations it looks like we are at risk regarding this issue.
(ref https://github.com/o19s/elasticsearch-learning-to-rank/issues/153)
Adding a reference to phab so that we can discuss the priority of this task in regard to the risk.
In light of the analysis posted on github ~~I see no other reasons except pure chance that we do not enter this deadlock on our production cluster.~~ elastic 5.5.3 is affected but since we are running 5.5.2 we're only hit by a minor bug where expired entries are not evicted in time.
Running an elastic version affected by this bug (5.5.3+) could be catastrophic since all search threads will stop responding leading to all services using the _search endpoints on the cluster to be blocked (Cirrus, translation search, Phab and possibly others).

Event Timeline

dcausse created this task.May 7 2018, 9:47 AM

Restricted Application edited projects, added Discovery-ARCHIVED, Discovery-Search; removed Discovery-Search (Current work). · View Herald TranscriptMay 7 2018, 9:47 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Digging further we are not affected because we are running 5.5.2 and the code involved in the deadlock was backported to 5.5.3.
5.5.2 still suffers from a cache bug but not a deadlock.
(ref https://github.com/elastic/elasticsearch/pull/26516)
This issue is not as urgent as I originally thought but should be considered as a blocker for any upgrade to a newer versions of elasticsearch.

dcausse updated the task description. (Show Details)May 7 2018, 11:40 AM

dcausse updated the task description. (Show Details)May 7 2018, 3:39 PM

Upstream opened a bug: https://github.com/elastic/elasticsearch/issues/30428 which has a pull request now attached, and the bug is tagged to be backported to 5.6.x. Proabably we will skip 5.6 and go straight to 6.x which will be backported as well.

Resolved for our purposes in search, probably should add some documentation to plugin.

• Vvjjkkii renamed this task from Possible deadlock in the elastic cache used by the ltr plugin to 2hdaaaaaaa.Jul 1 2018, 1:11 AM

• Vvjjkkii reopened this task as Open.

• Vvjjkkii removed • EBjune as the assignee of this task.

• Vvjjkkii raised the priority of this task from Medium to High.

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

CommunityTechBot renamed this task from 2hdaaaaaaa to Possible deadlock in the elastic cache used by the ltr plugin.Jul 2 2018, 3:20 PM

CommunityTechBot closed this task as Resolved.

CommunityTechBot assigned this task to • EBjune.

CommunityTechBot lowered the priority of this task from High to Medium.

CommunityTechBot updated the task description. (Show Details)

CommunityTechBot removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

CommunityTechBot added a subscriber: Aklapper.

Possible deadlock in the elastic cache used by the ltr pluginClosed, ResolvedPublicActions

Description

Event Timeline

Possible deadlock in the elastic cache used by the ltr plugin
Closed, ResolvedPublic
Actions