Page MenuHomePhabricator

Increase visibility of auto-generated tasks for RAID errors
Closed, ResolvedPublic

Description

We failed to track several auto-generated tasks for RAID errors, specially T215892: Degraded RAID on cloudvirt1024

Let's evaluate if we can increase the visibility of these auto-generated tasks by subscribing concrete people to the task, and adding the task to a concrete tag.

This is part of https://wikitech.wikimedia.org/wiki/Incident_documentation/20190213-cloudvps

Event Timeline

We discussed this a little bit yesterday, and T216088 was filed to further discuss this. Help there is welcome :)

I wonder, until we sort all this out, could we perhaps address this underlying problem with a simple Herald rule?

For what is worth, I do have a Herald rule that automatically subscribes me to any degraded RAID ticket for the databases and that proved to be a good way to get my attention, as otherwise monitoring the Operations queue is hard and it is easy to miss things.

marilerr claimed this task.
marilerr raised the priority of this task from High to Unbreak Now!.
marilerr updated the task description. (Show Details)
marilerr removed subscribers: Marostegui, faidon, colewhite and 7 others.
JJMC89 removed marilerr as the assignee of this task.
JJMC89 lowered the priority of this task from Unbreak Now! to High.
JJMC89 updated the task description. (Show Details)
JJMC89 added subscribers: Marostegui, faidon, colewhite and 8 others.
Andrew claimed this task.

We seem to be getting these alerts now.

Icinga alerts for these notifications on cloud/labs hosts were changed to notify the WMCS team in T246130