Page MenuHomePhabricator

RRDP status alert
Closed, ResolvedPublic

Description

We have an alert for RRDP status alert, where in grafana we get https://rrdp.ripe.net/notification.xml=-1

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
jbond triaged this task as Medium priority.Feb 13 2020, 11:48 AM
jbond subscribed.

I think this means that the query to that URL times out.
As it completes properly from codfw I'm wondering if it's not an issue with the webproxies (overloaded or similar).

Any idea who can help looking into it?

I think this means that the query to that URL times out.
As it completes properly from codfw I'm wondering if it's not an issue with the webproxies (overloaded or similar).

Any idea who can help looking into it?

akosiaris@cumin2001:~$ curl -x webproxy.codfw.wmnet:8080 https://rrdp.ripe.net/notification.xml=-1
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
akosiaris@cumin2001:~$ curl -x webproxy.eqiad.wmnet:8080 https://rrdp.ripe.net/notification.xml=-1
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

This 404s as well for me currently. So I guess the webproxies are fine?

As it keeps flapping I (temporarily) disabled the alert in eqiad, and we can rely in codfw if there is an actual issue.

ayounsi claimed this task.
  • Routinator upgraded in T252010. Which helped to remove the "dubious" targets.
  • Since this task has been opened, proxies have been moved to new hosts and performance has increase
  • Alerting has been tuned to only trigger on HTTP code > 399, as it's not possible to control the repositories we connect to, they will always be a risk of alert.