Page MenuHomePhabricator

check_http and SNI support
Closed, ResolvedPublic

Description

While working on T233956: Deploy Thanos (long-term storage) stateless components: sidecar and query I ran into an issue with using check_http as it is now and Envoy strict sni support, specifically check_http without --sni doesn't work:

icinga1001:~$ /usr/lib/nagios/plugins/check_http -H thanos-swift.discovery.wmnet -S -I 10.192.0.192 -u /healthcheck
CRITICAL - Cannot make SSL connection.

Versus

icinga1001:~$ /usr/lib/nagios/plugins/check_http -H thanos-swift.discovery.wmnet -S --sni -I 10.192.0.192 -u /healthcheck
HTTP OK: HTTP/1.1 200 OK - 279 bytes in 1.155 second response time |time=1.154771s;;;0.000000;10.000000 size=279B;;;0

I think we (in my decreasing order of preference) could:

  1. default check_http with --sni for the https cases
  2. relax the sni requirement on the envoy/thanos side
  3. add yet another specialized icinga command definition for https+sni

Event Timeline

I think option 3) is the easiest of all, has no risk to break existing checks and we already have many different check_commands using check_http in different ways so another one should not hurt us.

In modules/nagios_common/templates/checkcommands.cfg.erb we could add a snippet copied from f.e. line 342 to 345 from "check_https_url", call it check_https_url_sni, add the --sni option and then use check_https_url_sni in the check_command parameter of monitoring::service where it's needed.

Mentioned in SAL (#wikimedia-operations) [2020-05-21T10:06:09Z] <godog> test adding --sni to check_http -S on icinga2001 - T253292

I think option 3) is the easiest of all, has no risk to break existing checks and we already have many different check_commands using check_http in different ways so another one should not hurt us.

In modules/nagios_common/templates/checkcommands.cfg.erb we could add a snippet copied from f.e. line 342 to 345 from "check_https_url", call it check_https_url_sni, add the --sni option and then use check_https_url_sni in the check_command parameter of monitoring::service where it's needed.

Option 3) is indeed the easiest but not the simplest I think, I'd rather avoid it unless we absolutely can. To avoid yet another option for check https and to have to surface that all the way to puppet and in the various abstractions we have (e.g. service::catalog in the thanos case)

+1, option 3 is the most expedient but creates the most technical debt.

Re: option 1, I think it would be relatively straightforward to dump all check_http from the puppet db (or from the icinga config files) and make sure the https cases still succeed with --sni.

Change 597765 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] icinga: add --sni to check_http --ssl invocations

https://gerrit.wikimedia.org/r/597765

In terms of check_http usage ATM we have these invocations, for which we'd stop checking the default (non-SNI) certificate:

$ # the only check_http check not named after check_https is check_sso, which I think is fine to be sni-only
$ grep -ir check_https /etc/icinga/objects/ | awk '{print $3}' | sort -u
check_https_hostheader_port_url!localhost!443!/_stats
check_https_phabricator
check_https_port_status!8140!400!/puppet/v3
check_https_port_status!8141!400!/puppet/v3
check_https_redirect!7443!debmonitor.wikimedia.org!/!302!/login/?next=/
check_https_unauthorized!debmonitor.discovery.wmnet!/!400
check_https_url!analytics.wikimedia.org!/
check_https_url!dbtree.wikimedia.org!https://dbtree.wikimedia.org
check_https_url!en.wikipedia.org!/
check_https_url!kartotherian.discovery.wmnet!/osm-intl/6/23/24.png
check_https_url!librenms.wikimedia.org!http://librenms.wikimedia.org
check_https_url!ms-fe.svc.codfw.wmnet!/monitoring/backend
check_https_url!ms-fe.svc.codfw.wmnet!/monitoring/frontend
check_https_url!ms-fe.svc.eqiad.wmnet!/monitoring/backend
check_https_url!ms-fe.svc.eqiad.wmnet!/monitoring/frontend
check_https_url!netbox.wikimedia.org!https://netbox.wikimedia.org
check_https_url!noc.wikimedia.org!https://noc.wikimedia.org
check_https_url!people.wikimedia.org!https://people.wikimedia.org
check_https_url!piwik.wikimedia.org!/health_check
check_https_url!releases.wikimedia.org!/
check_https_url!static-bugzilla.wikimedia.org!/bug1.html
check_https_url!superset.wikimedia.org!/health_check
check_https_url!thanos-swift.discovery.wmnet!/healthcheck
check_https_url!turnilo.wikimedia.org!/health_check
check_https_url!yarn.wikimedia.org!/health_check
check_https_url_at_address_for_string!wikitech-static.wikimedia.org!/wiki/Main_Page?debug=true!Wikitech
check_https_url_for_string!lists.wikimedia.org!/mailman/listinfo/wikimedia-l!'Wikimedia
check_https_url_for_string!lists.wikimedia.org!/pipermail/wikimedia-l/!'The
check_https_url_for_string!registry1001.eqiad.wmnet!/v2/wikimedia-stretch/manifests/latest!schemaVersion
check_https_url_for_string!registry1002.eqiad.wmnet!/v2/wikimedia-stretch/manifests/latest!schemaVersion
check_https_url_for_string!registry2001.codfw.wmnet!/v2/wikimedia-stretch/manifests/latest!schemaVersion
check_https_url_for_string!registry2002.codfw.wmnet!/v2/wikimedia-stretch/manifests/latest!schemaVersion

All usages seem ok to me, the check_https_url!en.wikipedia.org!/ check is for application servers not the frontend/ats (for which we use check_ssl)

Change 598000 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] idp: rename check_sso_redirect command into check_https_sso_redirect

https://gerrit.wikimedia.org/r/598000

Change 598000 merged by Filippo Giunchedi:
[operations/puppet@production] idp: rename check_sso_redirect command into check_https_sso_redirect

https://gerrit.wikimedia.org/r/598000

RLazarus triaged this task as Medium priority.May 22 2020, 6:01 PM

Change 597765 merged by Filippo Giunchedi:
[operations/puppet@production] icinga: add --sni to check_http --ssl invocations

https://gerrit.wikimedia.org/r/597765

Mentioned in SAL (#wikimedia-operations) [2020-05-25T09:04:15Z] <godog> turn on sni by default for check_http --ssl icinga invocations - T253292

fgiunchedi claimed this task.

Change deployed, resolving