Page MenuHomePhabricator

prometheus::blackbox::check::http - allow checking the same virtual host on multiple backends
Closed, ResolvedPublic

Description

current status:

In the class prometheus::blackbox::check::http, which is used for monitoring various misc. services, there is there parameter $server_name which is by default set to $title.

It is described as # @param server_name - an FQDN, the server name to use (during TLS and Host:).

separate from that we have # @param instance_label - short-form host name, used as an instance label.

problem statement:

Let's say you have multiple backends serving a single virtual host, example:

virtual host: doc.wikimedia.org
machines: doc1001.eqiad.wmnet, doc2001.codfw.wmnet

which is a very common type of setup.

Now if you want to monitor doc.wikimedia.org you have the following options:

  • use "doc.wikimedia.org" as the resource title but tell puppet with an 'if-then-else' to only monitor the "active host" or one of the 2 hosts
  • use "doc.wikimedia.org" as the resource title and apply it on both instances, puppet run will fail with a "duplicate declaration" error because the same $title is used more than once
  • use the instance name as $title, but now requests go to doc1001.eqiad.wmnet / doc2001.codfw.wmnet and you are not actually monitoring doc.wikimedia.org

suggested fix:

Have separate parameters for "virtual host" and "instance FQDN" so we can truly check "virtual host X on host Y and host Z".

Just like when you manually use curl you also have separate parameters for a virtual host you are requesting and a host you are requesting it from.

Event Timeline

Or is the idea that I should set $server_name explicitly and just avoid that it defaults to $title, so that my $title can vary but $server_name stays the same?

Change 890903 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] doc: fix hostname used in http::blackbox monitoring

https://gerrit.wikimedia.org/r/890903

use "doc.wikimedia.org" as the resource title and apply it on both instances, puppet run will fail with a "duplicate declaration" error because the same $title is used more than once

Resource names need to be unique on a single host, not globally. (Otherwise you could not for example have package { 'ssh-server': ensure => present } on multiple machines!). You can simply have a prometheus::blackbox::check::http resource with the vhost name as the resource title on all the backends.

I can't confirm that. We are applying the same role class to multiple nodes and then getting duplicate declaration errors due to a hardcoded resource names is a common occurence.

What exact error message are you getting in this case?

Change 890903 merged by Dzahn:

[operations/puppet@production] doc: fix hostname used in http::blackbox monitoring

https://gerrit.wikimedia.org/r/890903

setting both server_name and instance_label explicitly while also making sure the resource title is not the same on multiple instances (by using the server hostname) seems to be a solution to this.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/890903/2/modules/profile/manifests/doc.pp

Resource names need to be unique on a single host, not globally. (Otherwise you could not for example have package { 'ssh-server': ensure => present } on multiple machines!).

And you can't, if there are multiple hosts and the same resource title is used more than once, in a loop.

example:

$all_hosts.each |Stdlib::Fqdn $other_host| {
...
prometheus::blackbox::check::http { 'foo:

This way it's the same resource more than once on the same hosts (monitoring or rather prometheus hosts where they get realized).

And it's why for example ensure_packages from stdlib exists and is "only install if it doesn't already exist".

Dzahn claimed this task.

closing this assuming T330233#8635725 is the expected way to use it / fix and we should simply not let server_name default to $title.

Yes title must be unique, and server_name is generally fine as title as long as we're checking internal services. As you found out this doesn't work too well on public services, we're tracking the issue at T312840 though I haven't had the time/bandwidth to work on it, feedback is welcome on the task too!

I'm also confused as to why prometheus::blackbox::check::http is within a loop on $all_hosts as opposed to declaring it outside of the loop and in the profile, which I'm assuming is going to run on $all_hosts anyways?