Page MenuHomePhabricator

tendril cert expiry alerts on dbmonitor hosts
Closed, ResolvedPublic

Description

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=tendril

^ the green one is OK, but the yellow ones should be gone

dbmonitor1001
	
	
HTTPS-tendril
	
This service problem has been acknowledged	
	WARNING 	2017-04-04 18:56:41 	1d 17h 2m 52s 	3/3 	SSL WARNING - Certificate tendril.wikimedia.org valid until 2017-04-10 01:53:01 +0000 (expires in 5 days) 	
dbmonitor2001
	
	
HTTPS-tendril
	
This service problem has been acknowledged	
	WARNING 	2017-04-04 18:55:54 	1d 17h 8m 40s 	3/3 	SSL WARNING - Certificate tendril.wikimedia.org valid until 2017-04-10 01:47:36 +0000 (expires in 5 days) 	
einsteinium
	
	
HTTPS-tendril
	
	OK 	2017-04-04 18:56:42 	54d 4h 48m 26s 	1/3 	SSL OK - Certificate tendril.wikimedia.org valid until 2017-05-28 03:56:00 +0000 (expires in 53 days)

This is because the monitoring check gets added per role::tendril, and moving tendril from einsteinium to dbmonitor is ongoing thing afaict.

Multiple hosts get the check but only one actually has the tendril name in DNS, as of right now that is still einsteinium.

So there is no real cert expiry issue here, the actual tendril.wm.org is fine and has LE autorenewal.

Add some setting to get the monitoring just on the host that currently has the tendril CNAME in DNS whichever it is, so that the role can be moved freely without causing false alerts.

Event Timeline

Dzahn created this task.Apr 4 2017, 6:51 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 4 2017, 6:51 PM
Dzahn claimed this task.Apr 4 2017, 6:51 PM
Dzahn added a subscriber: akosiaris.
Dzahn added a subscriber: jcrespo.
Dzahn triaged this task as Medium priority.Apr 4 2017, 6:54 PM
Dzahn updated the task description. (Show Details)
Dzahn updated the task description. (Show Details)Apr 4 2017, 6:57 PM
jcrespo added a subscriber: Marostegui.
Marostegui moved this task from Triage to Backlog on the DBA board.Apr 11 2017, 9:03 AM

Change 348172 had a related patch set uploaded (by Dzahn):
[operations/puppet@production] tendril: skip cert monitoring where Letsencrypt is disabled

https://gerrit.wikimedia.org/r/348172

Change 348172 merged by Dzahn:
[operations/puppet@production] tendril: skip cert monitoring where Letsencrypt is disabled

https://gerrit.wikimedia.org/r/348172

fixed. false positives are gone, the real check stays and is OK

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=tendril

Dzahn closed this task as Resolved.Apr 14 2017, 12:33 AM
Dzahn removed a project: Patch-For-Review.