Page MenuHomePhabricator

prometheus-lvs-realserver-mss crashed on ncredir2002
Open, MediumPublic

Description

Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]: [!] Unexpected answer: None
Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]: Traceback (most recent call last):
Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]:   File "/usr/local/bin/prometheus-lvs-realserver-mss", line 109, in <module>
Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]:     gauge.labels(endpoint, f'IPv{version}').set(mss)
Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]:   File "/usr/lib/python3/dist-packages/prometheus_client/metrics.py", line 396, in set
Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]:     self._value.set(float(value))
Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]:                     ^^^^^^^^^^^^
Jan 10 08:20:04 ncredir2002 prometheus-lvs-realserver-mss[299254]: TypeError: float() argument must be a string or a real number, not 'NoneType'

Event Timeline

Vgutierrez triaged this task as Medium priority.Jan 10 2024, 8:28 AM
Vgutierrez moved this task from Backlog to Traffic team actively servicing on the Traffic board.

apparently get_mss failed to get|capture a SYN/ACK:

61     if synack is None or synack[TCP] is None:
62         print(f"[!] Unexpected answer: {synack}", file=sys.stderr)
63         return None

what's the best strategy here @fgiunchedi? I see two options:

An additional metric to count failures seems appropriate to me; since these are exceptional events even a general counter (i.e. not per endpoint) would work I think

Change 989459 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] lvs::realserver::ipip: Report errors on MSS monitoring

https://gerrit.wikimedia.org/r/989459

Change 989459 merged by Vgutierrez:

[operations/puppet@production] lvs::realserver::ipip: Report errors on MSS monitoring

https://gerrit.wikimedia.org/r/989459