Page MenuHomePhabricator

monitor expiration of labvirt-star SSL cert
Closed, ResolvedPublic


agreed with @Andrew to split this out from the parent task because it's a special case.

all other certs are used for either https or ldaps which we can both monitor with the same puppet abstractions for icinga,
but here the cert is used by openstack nova. i portscanned that a bit and tried to connect on any of the open ports but i just got
"unknown protocol" errors when trying to speak SSL/TLS to them.

@ArielGlenn found out it's using the cert for sasl auth and the releveant config is in libvirtd.conf

<apergos> /etc/libvirt/libvirtd.conf read that...
<apergos> auth_tcp = "sasl"

Event Timeline

Dzahn created this task.Oct 22 2015, 9:57 PM
Dzahn claimed this task.
Dzahn raised the priority of this task from to High.
Dzahn updated the task description. (Show Details)
Dzahn removed a project: Patch-For-Review.
Dzahn set Security to None.
Dzahn added subscribers: Krenair, chasemp, yuvipanda and 8 others.

We might as well just fix it with some simple work around for this one special case, such as a generic check that turns critical on a certain date, and we feed it the expiration date manually once

Restricted Application added a project: Cloud-Services. · View Herald TranscriptOct 22 2015, 10:00 PM

@neon:/usr/lib/nagios/plugins# ./check_http -I labvirt1001.eqiad.wmnet -p 5925 -S

CRITICAL - Cannot make SSL connection
...error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:759:
HTTP CRITICAL - Error on receive

nc labvirt1001.eqiad.wmnet 5901
RFB 003.008

<apergos> tcp 0 0 *:5906 *:* LISTEN 8360/kvm

Change 249147 had a related patch set uploaded (by Andrew Bogott):
Add monitoring for the kvm ssl cert, labvirt-star

Change 249147 merged by Dzahn:
Add monitoring for the kvm ssl cert, labvirt-star

let me steal this for a moment, to check out why they became status UNKNOWN. i said earlier i would but didn't get to it yet.

Dzahn claimed this task.Oct 27 2015, 10:59 PM

We either need to make this an NRPE task to be executed on the monitored hosts where the certs are, or we need to copy the cert to neon.

And for some reason the check_command did not get defined yet with the change above. just the services using it.

Change 249328 had a related patch set uploaded (by Dzahn):
labs kvm ssl cert monitoring: fix it

Change 249328 merged by Dzahn:
labs kvm ssl cert monitoring: fix it

Dzahn added a comment.EditedOct 28 2015, 12:06 AM

after this. on labvirt1001, the plugin got created:

Notice: /Stage[main]/Openstack::Nova::Compute/File[/usr/local/lib/nagios/plugins/check_ssl_certfile]/ensure: defined content...

the NRPE command got defined:

root@labvirt1001:/etc/nagios/nrpe.d# cat check_kvm_ssl_cert.cfg

# File generated by puppet. DO NOT edit by hand
command[check_kvm_ssl_cert]=/usr/local/lib/nagios/plugins/check_ssl_certfile labvirt-star.eqiad.wmnetroot@la

but we still need to fix the path to the cert

Change 249331 had a related patch set uploaded (by Dzahn):
labs kvm ssl cert monitoring: fix nrpe command

Change 249331 merged by Dzahn:
labs kvm ssl cert monitoring: fix nrpe command

Dzahn closed this task as Resolved.Oct 28 2015, 12:37 AM

thanks for fixing, sorry my patch was dumb :(

Change 346236 had a related patch set uploaded (by Dzahn):
[operations/puppet@production] nagios_common: fi/enhance check_ssl_certfile plugin

Change 346236 merged by Dzahn:
[operations/puppet@production] nagios_common: fix/enhance check_ssl_certfile plugin

Dzahn added a comment.Apr 4 2017, 6:11 PM

after the merge above now Icinga checks turned CRIT as they should have. due to a bug they stayed just WARN before for longer than expected. (which meant they didn't show up on IRC due to bot settings)