Page MenuHomePhabricator

cloudmetrics1001 is unreachable, preventing integraality webserver to start
Closed, ResolvedPublic

Description

Checking the logs:

Shell logs (since 2022-11-20):

Sending metric toolforge.integraality.periodic_update.wikidata:86081000 to cloudmetrics1001.eqiad.wmnet:8125
nc: getaddrinfo for host "cloudmetrics1001.eqiad.wmnet" port 8125: Name or service not known

UWSGI logs:

Traceback (most recent call last):
  File "/data/project/integraality/www/python/src/app.py", line 10, in <module>
    from pages_processor import PagesProcessor, ProcessingException
  File "./pages_processor.py", line 16, in <module>
    from property_statistics import PropertyStatistics, QueryException
  File "./property_statistics.py", line 16, in <module>
    from statsd.defaults.env import statsd
  File "/data/project/integraality/www/python/venv/lib/python3.7/site-packages/statsd/defaults/env.py", line 17, in <module>
    maxudpsize=maxudpsize, ipv6=ipv6)
  File "/data/project/integraality/www/python/venv/lib/python3.7/site-packages/statsd/client/udp.py", line 35, in __init__
    host, port, fam, socket.SOCK_DGRAM)[0]
  File "/usr/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):

This all appears caused by T297444: decommission cloudmetrics100[1-2].eqiad.wmnet.

The weekly cron'd update does not care too much (as per the logs), but crazily enough, the pystatsd library just crashes if the statsd host cannot be resolved (see https://github.com/jsocol/pystatsd/issues/130)

Event Timeline

JeanFred renamed this task from cloudmetrics1001 is unreachable from intgraality, cascading failure to cloudmetrics1001 is unreachable, preventing integraality webserver to start.Dec 26 2022, 10:21 AM
JeanFred triaged this task as High priority.

Mentioned in SAL (#wikimedia-cloud) [2022-12-26T10:48:34Z] <wm-bot> <jeanfred> Deploy c90f9ef (T325936)