paws prometheus no longer 'trusted' in metricsinfra::alertmanager
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Andrew
	Feb 26 2024, 7:18 PM

Description

The metricsinfra config includes this bit:

profile::wmcs::metricsinfra::alertmanager::project_proxy::trusted_hosts:
  paws:
  - paws-prometheus-1.paws.eqiad1.wikimedia.cloud
  - paws-prometheus-2.paws.eqiad1.wikimedia.cloud

That value is used to 'Configure an Apache vhost that lets other trusted projects submit requests to the alertmanager api.'

Those hosts no longer exist, which means that a) puppet is broken and b) the new k8s-hosted prometheus is probably no longer trusted.

Fixing puppet is easy, I can just remove that config section. This task is about figuring out what we've lost, and if there's a way to get it back without necessarily knowing the IPs of the new prometheus nodes.

Related Objects

Mentioned Here: T356429: Remove paws-prometheus-[12]

Event Timeline

Andrew created this task.Feb 26 2024, 7:18 PM

Restricted Application added a project: cloud-services-team. · View Herald TranscriptFeb 26 2024, 7:18 PM

So far as I know the prometheus nodes that I removed in T356429 hadn't been doing anything for some time (A year or more).

taavi removed a parent task: T304716: Cloud services enhancement proposal: Prometheus metrics for Toolforge/Toolsbeta/Paws Kubernetes clusters.Feb 26 2024, 7:24 PM

taavi edited projects, added Cloud-VPS, PAWS; removed User-dcaro, Cloud Services Proposals.

Please remove that bit of config. This config is used to configre access for sending alerts via metricsinfra-alertmanager. The mechanism was designted for Toolforge primarly, and PAWS was included because at the time the Kubernetes setup there was adapted from the Toolforge's Kubeadm+Puppet setup. I suspect it was actually never used to send an alert in PAWS, as the mechanism for deploying alert rules was introduced only after PAWS moved to Magnum if my memory is correct.

If PAWS wants to send alerts in the future via the current Prometheus setup there we would need to do a bit of re-thinking for the current authentication setup as the Prometheus pods are running on normal K8s worker nodes there.

Please remove that bit of config.

Done, and puppet is happy again. I will stop thinking about this!

paws prometheus no longer 'trusted' in metricsinfra::alertmanagerClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

paws prometheus no longer 'trusted' in metricsinfra::alertmanager
Closed, ResolvedPublic
Actions