Page MenuHomePhabricator

Icinga alerts mention the wrong data center
Closed, ResolvedPublic

Description

Today's sessionstore alert was a codfw issue only, but the alert mentions both eqiad and codfw:

PROBLEM - LVS sessionstore codfw port 8081/tcp - Session store- sessionstore.svc.eqiad.wmnet IPv4 #page on sessionstore.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems

That's because in hieradata/common/service.yaml we say:

sessionstore:
  description: Session store, sessionstore.svc.%{::site}.wmnet

Icinga uses that description, but in that context %{::site} refers to the site in which Icinga is running, not sessionstore. The result is confusing, and can mislead people into looking for problems in the wrong place. We should fix this and other instances so the alert text is clear on where the problem is.

Event Timeline

Change 695870 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] service::catalog: stop using %{::site} in interpolations

https://gerrit.wikimedia.org/r/695870

Change 695870 merged by Giuseppe Lavagetto:

[operations/puppet@production] service::catalog: fix the use of %{::site} in interpolations

https://gerrit.wikimedia.org/r/695870

Joe claimed this task.