Page MenuHomePhabricator

Alert when ES indexes are freezed for more than 30 minutes
Closed, ResolvedPublic

Description

It went undetected for 12 hours last time, we ought to do slightly better I guess.

Event Timeline

Joe raised the priority of this task from to Needs Triage.
Joe updated the task description. (Show Details)
Joe added projects: acl*sre-team, observability.
Joe subscribed.
akosiaris subscribed.

How would you manually check whether they are frozen and for how long?

I guess that by frozen indices, we refer to freezing the jobs that write to elasticsearch, not closing the indices in elasticsearch itself. I'm not actually sure how that freezing works, I'll dig into the code see if I can understand.

Deskana lowered the priority of this task from High to Low.Dec 8 2016, 11:08 PM
Deskana subscribed.

This hasn't been touched in quite a while, so lowering priority and putting in the "Later" column. If this is important somehow, please feel free to let me know and we can shuffle it around.

It's an explicit follow-up from an incident. These should be prioritized along side other "fun/new" work appropriately (iow: not dropped).

It's an explicit follow-up from an incident. These should be prioritized along side other "fun/new" work appropriately (iow: not dropped).

@greg Good to know. I chatted to @EBernhardson about it before reprioritising and he said it's unclear how relevant this is now given how our rolling restarts work now. Hopefully @Gehel should know more. :-)

This hasn't been touched in quite a while, so lowering priority

I know this is a general Phabricator workflow thing but i never understood this logic, in other ticket systems priority would be raised when things had not been touched in a long time, not the other way around.

I know this is a general Phabricator workflow thing but i never understood this logic, in other ticket systems priority would be raised when things had not been touched in a long time, not the other way around.

It's a fair point. I generally use task priority as descriptive; using that lens, if something hasn't been touched for over a year, then it's not really high priority, and keeping it marked as such is misleading. If everything is high priority, then nothing is. :-)

Gehel claimed this task.