Page MenuHomePhabricator

LVSRealserverMSS alert is broken for ferm based hosts
Closed, ResolvedPublic

Description

alert expression looks like this:

sum by (hostname, protocol, endpoint, cluster) (
  label_replace(lvs_realserver_mss_value, "hostname", "$1", "instance", "(.*):.*")
  )
/ ignoring (endpoint) group_left
sum by (hostname, protocol, cluster) (
  label_replace(tcp_mss_clamper_mss_cfg{interface="lo"}, "hostname", "$1", "instance", "(.*):.*")
) != 1

On ferm based hosts lvs_realserver_mss_value is still being reported as expected but tcp_mss_clamper_mss_cfg isn't

Event Timeline

Vgutierrez triaged this task as Medium priority.Jun 26 2024, 2:34 PM
Vgutierrez moved this task from Backlog to Scheduled incidental work on the Traffic board.

Change #1062457 had a related patch set uploaded (by CDobbins; author: CDobbins):

[operations/puppet@production] prometheus: add script to check TCP MSS clamping value

https://gerrit.wikimedia.org/r/1062457

Change #1082553 had a related patch set uploaded (by CDobbins; author: CDobbins):

[operations/puppet@production] prometheus: add script to check TCP MSS clamping value

https://gerrit.wikimedia.org/r/1082553

Change #1082553 abandoned by CDobbins:

[operations/puppet@production] prometheus: add script to check TCP MSS clamping value

Reason:

created by mistake

https://gerrit.wikimedia.org/r/1082553

Change #1062457 merged by CDobbins:

[operations/puppet@production] prometheus: add script to check TCP MSS clamping value

https://gerrit.wikimedia.org/r/1062457

Change #1110843 had a related patch set uploaded (by CDobbins; author: CDobbins):

[operations/alerts@master] alerts: add alert for ferm_mss_cfg Prometheus metric

https://gerrit.wikimedia.org/r/1110843

Change #1110843 merged by jenkins-bot:

[operations/alerts@master] alerts: add alert for ferm_mss_cfg Prometheus metric

https://gerrit.wikimedia.org/r/1110843