prometheus::blackbox::check::http - allow checking the same virtual host on multiple backends
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Dzahn
	Feb 21 2023, 8:41 PM

Description

current status:

In the class prometheus::blackbox::check::http, which is used for monitoring various misc. services, there is there parameter $server_name which is by default set to $title.

It is described as # @param server_name - an FQDN, the server name to use (during TLS and Host:).

separate from that we have # @param instance_label - short-form host name, used as an instance label.

problem statement:

Let's say you have multiple backends serving a single virtual host, example:

virtual host: doc.wikimedia.org
machines: doc1001.eqiad.wmnet, doc2001.codfw.wmnet

which is a very common type of setup.

Now if you want to monitor doc.wikimedia.org you have the following options:

use "doc.wikimedia.org" as the resource title but tell puppet with an 'if-then-else' to only monitor the "active host" or one of the 2 hosts

use "doc.wikimedia.org" as the resource title and apply it on both instances, puppet run will fail with a "duplicate declaration" error because the same $title is used more than once

use the instance name as $title, but now requests go to doc1001.eqiad.wmnet / doc2001.codfw.wmnet and you are not actually monitoring doc.wikimedia.org

suggested fix:

Have separate parameters for "virtual host" and "instance FQDN" so we can truly check "virtual host X on host Y and host Z".

Just like when you manually use curl you also have separate parameters for a virtual host you are requesting and a host you are requesting it from.

Details

	Subject	Repo	Branch	Lines +/-
	doc: fix hostname used in http::blackbox monitoring	operations/puppet	production	+3 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Dzahn	T327772 Automatically created ProbeDown tasks for Collab services
Resolved	Dzahn	T329587 http::blackbox monitoring for all https services (let all serviceops-collab alertmanager alerts create tickets)
Resolved	Dzahn	T327973 create blackbox::http monitoring for doc.wikimedia.org
Resolved	Dzahn	T330233 prometheus::blackbox::check::http - allow checking the same virtual host on multiple backends

Event Timeline

Dzahn created this task.Feb 21 2023, 8:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 21 2023, 8:41 PM

Dzahn updated the task description. (Show Details)Feb 21 2023, 8:42 PM

Dzahn updated the task description. (Show Details)Feb 21 2023, 8:45 PM

Dzahn added a parent task: T327973: create blackbox::http monitoring for doc.wikimedia.org.

Or is the idea that I should set $server_name explicitly and just avoid that it defaults to $title, so that my $title can vary but $server_name stays the same?

Change 890903 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] doc: fix hostname used in http::blackbox monitoring

https://gerrit.wikimedia.org/r/890903

gerritbot added a project: Patch-For-Review.Feb 21 2023, 8:53 PM

use "doc.wikimedia.org" as the resource title and apply it on both instances, puppet run will fail with a "duplicate declaration" error because the same $title is used more than once

Resource names need to be unique on a single host, not globally. (Otherwise you could not for example have package { 'ssh-server': ensure => present } on multiple machines!). You can simply have a prometheus::blackbox::check::http resource with the vhost name as the resource title on all the backends.

I can't confirm that. We are applying the same role class to multiple nodes and then getting duplicate declaration errors due to a hardcoded resource names is a common occurence.

What exact error message are you getting in this case?

Change 890903 merged by Dzahn:

[operations/puppet@production] doc: fix hostname used in http::blackbox monitoring

https://gerrit.wikimedia.org/r/890903

Maintenance_bot removed a project: Patch-For-Review.Feb 21 2023, 11:10 PM

setting both server_name and instance_label explicitly while also making sure the resource title is not the same on multiple instances (by using the server hostname) seems to be a solution to this.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/890903/2/modules/profile/manifests/doc.pp

In T330233#8634821, @taavi wrote:

Resource names need to be unique on a single host, not globally. (Otherwise you could not for example have package { 'ssh-server': ensure => present } on multiple machines!).

And you can't, if there are multiple hosts and the same resource title is used more than once, in a loop.

example:

$all_hosts.each |Stdlib::Fqdn $other_host| {
...
prometheus::blackbox::check::http { 'foo:

This way it's the same resource more than once on the same hosts (monitoring or rather prometheus hosts where they get realized).

And it's why for example ensure_packages from stdlib exists and is "only install if it doesn't already exist".

closing this assuming T330233#8635725 is the expected way to use it / fix and we should simply not let server_name default to $title.

Yes title must be unique, and server_name is generally fine as title as long as we're checking internal services. As you found out this doesn't work too well on public services, we're tracking the issue at T312840 though I haven't had the time/bandwidth to work on it, feedback is welcome on the task too!

I'm also confused as to why prometheus::blackbox::check::http is within a loop on $all_hosts as opposed to declaring it outside of the loop and in the profile, which I'm assuming is going to run on $all_hosts anyways?

prometheus::blackbox::check::http - allow checking the same virtual host on multiple backendsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

prometheus::blackbox::check::http - allow checking the same virtual host on multiple backends
Closed, ResolvedPublic
Actions

Related Objects
Search...