Page MenuHomePhabricator

Review paging setup for WMCS with onboarding in alertmanager
Closed, ResolvedPublic

Description

Scope: The cloud hosts in production and perhaps toolschecker, the internal cloud observability is happening on T284860 and T194333.

There's a template for onboarding into alertmanager, which currently has Phabricator integrations as well as IRC and email/VO. Shall we go try to set that up?

From there, we could use better definition around alerting, when we alert and what to do when an alert comes in (so this is also a documentation task).

@dcaro proposed this general structure

Needs action now -> page + task
Needs action -> task
Does not need action -> to logging system (probably not an alert)

In general, icinga allows runbooks to be included with alerts, and a review of our current alerts to add the right runbooks would definitely help. The systemd alerts are also a thing, which is on the subtask.