Page MenuHomePhabricator

paws prometheus no longer 'trusted' in metricsinfra::alertmanager
Closed, ResolvedPublic

Description

The metricsinfra config includes this bit:

profile::wmcs::metricsinfra::alertmanager::project_proxy::trusted_hosts:
  paws:
  - paws-prometheus-1.paws.eqiad1.wikimedia.cloud
  - paws-prometheus-2.paws.eqiad1.wikimedia.cloud

That value is used to 'Configure an Apache vhost that lets other trusted projects submit requests to the alertmanager api.'

Those hosts no longer exist, which means that a) puppet is broken and b) the new k8s-hosted prometheus is probably no longer trusted.

Fixing puppet is easy, I can just remove that config section. This task is about figuring out what we've lost, and if there's a way to get it back without necessarily knowing the IPs of the new prometheus nodes.

Related Objects

Event Timeline

So far as I know the prometheus nodes that I removed in T356429 hadn't been doing anything for some time (A year or more).

Please remove that bit of config. This config is used to configre access for sending alerts via metricsinfra-alertmanager. The mechanism was designted for Toolforge primarly, and PAWS was included because at the time the Kubernetes setup there was adapted from the Toolforge's Kubeadm+Puppet setup. I suspect it was actually never used to send an alert in PAWS, as the mechanism for deploying alert rules was introduced only after PAWS moved to Magnum if my memory is correct.

If PAWS wants to send alerts in the future via the current Prometheus setup there we would need to do a bit of re-thinking for the current authentication setup as the Prometheus pods are running on normal K8s worker nodes there.

Please remove that bit of config.

Done, and puppet is happy again. I will stop thinking about this!