RRDP status alert
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jijiki
	Feb 13 2020, 9:26 AM

Description

We have an alert for RRDP status alert, where in grafana we get https://rrdp.ripe.net/notification.xml=-1

Related Objects

Mentioned In: T247759: eqiad squid performances issue
T245176: Add Prometheus Squid exporter
Mentioned Here: T252010: Upgrade Routinator 3000 to 0.7.0

Event Timeline

jijiki created this task.Feb 13 2020, 9:26 AM

Restricted Application added a project: SRE. · View Herald TranscriptFeb 13 2020, 9:26 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

jbond triaged this task as Medium priority.Feb 13 2020, 11:48 AM

jbond subscribed.

I think this means that the query to that URL times out.
As it completes properly from codfw I'm wondering if it's not an issue with the webproxies (overloaded or similar).

Any idea who can help looking into it?

ayounsi mentioned this in T245176: Add Prometheus Squid exporter.Feb 13 2020, 5:13 PM

In T245121#5881546, @ayounsi wrote:

I think this means that the query to that URL times out.
As it completes properly from codfw I'm wondering if it's not an issue with the webproxies (overloaded or similar).

Any idea who can help looking into it?

akosiaris@cumin2001:~$ curl -x webproxy.codfw.wmnet:8080 https://rrdp.ripe.net/notification.xml=-1
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
akosiaris@cumin2001:~$ curl -x webproxy.eqiad.wmnet:8080 https://rrdp.ripe.net/notification.xml=-1
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

This 404s as well for me currently. So I guess the webproxies are fine?

https://rrdp.ripe.net/notification.xml=-1

it works with out =-1 (i.e. https://rrdp.ripe.net/notification.xml) is that a typo?

It's because Grafana reports Routinator pulling data from https://rrdp.ripe.net/notification.xml as a -1 on its graph. Where I think -1 means timeout.

So the correct URL is https://rrdp.ripe.net/notification.xml indeed.
See https://grafana.wikimedia.org/d/UwUa77GZk/rpki?orgId=1&from=now-24h&to=now&fullscreen&panelId=56

vs. codfw:
https://grafana.wikimedia.org/d/UwUa77GZk/rpki?orgId=1&from=now-24h&to=now&fullscreen&panelId=55

As it keeps flapping I (temporarily) disabled the alert in eqiad, and we can rely in codfw if there is an actual issue.

ayounsi mentioned this in T247759: eqiad squid performances issue.Mar 16 2020, 2:49 PM

Routinator upgraded in T252010. Which helped to remove the "dubious" targets.
Since this task has been opened, proxies have been moved to new hosts and performance has increase
Alerting has been tuned to only trigger on HTTP code > 399, as it's not possible to control the repositories we connect to, they will always be a risk of alert.

RRDP status alert Closed, ResolvedPublicActions

Description

Related Objects

Event Timeline

RRDP status alert
Closed, ResolvedPublic
Actions