Page MenuHomePhabricator

Prometheus logs showing errors for routinator
Closed, ResolvedPublic0 Estimate Story Points

Description

I noticed lots of error in the Prometheus logs related to routinator. it appears something may have broke on 2019-06-03 18:00

The error i see is

Jun  5 06:44:51 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-06-05T06:44:51.459382635Z caller=scrape.go:835 component="scrape manager" scrape_pool=routinator target=http://rpki1001:9556/metrics msg="append failed" err="invalid metric type \"\""

Related Objects

StatusSubtypeAssignedTask
Resolvedayounsi
Resolvedayounsi

Event Timeline

jbond triaged this task as Medium priority.Jun 5 2019, 3:33 PM
jbond created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 5 2019, 3:33 PM
ayounsi added a subscriber: fgiunchedi.EditedJun 5 2019, 3:38 PM

Data collection stopped after the upgrade to Routinator 0.4.0:
https://grafana.wikimedia.org/d/UwUa77GZk/rpki?refresh=5m&orgId=1&from=now-7d&to=now

ayounsi@rpki1001:~$ curl localhost:9556/metrics
# HELP valid_roas number of valid ROAs seen
# TYPE valid_roas gauge
valid_roas{tal="apnic"} 2940
valid_roas{tal="ripe"} 9929
valid_roas{tal="afrinic"} 289
valid_roas{tal="lacnic"} 2338
valid_roas{tal="arin"} 4800

# HELP vrps_total total number of VRPs seen
# TYPE vrps_total gauge
vrps_total{tal="apnic"} 20425
vrps_total{tal="ripe"} 54026
vrps_total{tal="afrinic"} 425
vrps_total{tal="lacnic"} 6598
vrps_total{tal="arin"} 6250

# HELP last_update_start seconds since last update started
# TYPE gauge
last_update_start 2014

# HELP last_update_duration duration in seconds of last update
# TYPE gauge
last_update_duration 39

# HELP last_update_done seconds since last update finished
# TYPE gauge
last_update_done 1975

# HELP serial current RTR serial number
# TYPE gauge
serial 34

I don't know enough of Prometheus, but if the issue is from Routinator I can open a ticket upstream.

Yes that looks like an error on routinator side, you can also use promtool check rules to see what prometheus makes of that

prometheus1003:~$ curl -s http://rpki1001:9556/metrics  | promtool check metrics
error while linting: text format parsing error in line 31: unexpected end of input stream

my hunch is a missing newline at the end of file.

ayounsi claimed this task.Jun 5 2019, 4:04 PM
fgiunchedi added a comment.EditedJun 5 2019, 4:29 PM

Unrelated to the issue at hand, but I'd also recommend upstream to prefix metrics with routinator_ so it is clear where they are coming from

fgiunchedi moved this task from Inbox to Radar on the observability board.Jun 7 2019, 12:24 PM
ayounsi closed this task as Resolved.Jul 25 2019, 2:05 AM

Fixed with the latest upgrade of Routinator