Prevent depooled Prometheus to send alerts during maintenance
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	fgiunchedi
	Mar 7 2023, 3:52 PM

Description

During the latest switch maintenance (T329073) prometheus1005 was depooled from LVS, though not "depooled" from Alertmanager, in the sense that the host kept firing alerts (from its POV anyways). We should be more proactive and make sure we can effectively prevent a depooled host from sending alerts too during maintenance

Related Objects

Mentioned Here: T329073: eqiad row A switches upgrade

Event Timeline

fgiunchedi created this task.Mar 7 2023, 3:52 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 7 2023, 3:52 PM

andrea.denisse subscribed.Mar 8 2023, 3:25 PM

fgiunchedi added a project: User-fgiunchedi.Mar 15 2023, 3:14 PM

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.

Change 900238 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] DNM: test alertmanager depool for prometheus1006

https://gerrit.wikimedia.org/r/900238

gerritbot added a project: Patch-For-Review.Mar 16 2023, 10:33 AM

Setting alertmanagers: [] for the host in question is enough to remove its AM configuration, see also this PCC https://puppet-compiler.wmflabs.org/output/900238/40160/prometheus1006.eqiad.wmnet/index.html

Documented at https://wikitech.wikimedia.org/wiki/Prometheus#Depool_Prometheus_for_reads_and_writes

Prevent depooled Prometheus to send alerts during maintenanceClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Prevent depooled Prometheus to send alerts during maintenance
Closed, ResolvedPublic
Actions