Page MenuHomePhabricator

Prometheus failing to ingest some mtail samples
Closed, ResolvedPublic

Description

Aug  7 14:53:05 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:53:05.996086559Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://centrallog1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=1764
Aug  7 14:53:08 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:53:08.143583353Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://mx1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=17
Aug  7 14:54:05 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:54:05.993059027Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://centrallog1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=1761
Aug  7 14:54:08 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:54:08.147333235Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://mx1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=17
Aug  7 14:55:05 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:55:05.98969224Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://centrallog1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=1759
Aug  7 14:55:08 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:55:08.141738058Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://mx1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=17
Aug  7 14:56:05 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:56:05.986224713Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://centrallog1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=1757
Aug  7 14:56:08 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:56:08.145380319Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://mx1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=17
Aug  7 14:57:05 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:57:05.997663971Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://centrallog1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=1754
Aug  7 14:57:08 prometheus1003 prometheus@ops[2846]: level=warn ts=2019-08-07T14:57:08.143336672Z caller=scrape.go:1094 component="scrape manager" scrape_pool=mtail target=http://mx1001:3903/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=17

Event Timeline

Looks like related to the fact that mtail attaches timestamps to metrics from the last time it has seen/updated that particular metric, e.g.

mx1001:~$ curl -s localhost:3903/metrics | awk '$3 ~ /[0-9]/ { print }' | sort -k3 -n | head -10
exim_tls_connections{ciphersuite="TLS1.1:RSA_AES_256_CBC_SHA1:256",prog="exim.mtail",status="in"} 1 1565236340505
exim_tls_connections{ciphersuite="TLS1.2:ECDHE_RSA_AES_128_CBC_SHA1:128",prog="exim.mtail",status="in"} 9 1565281573787
exim_tls_connections{ciphersuite="TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256",prog="exim.mtail",status="out"} 4 1565292025167
exim_tls_connections{ciphersuite="TLS1.1:RSA_AES_256_CBC_SHA1:256",prog="exim.mtail",status="out"} 6 1565384286798
exim_tls_connections{ciphersuite="TLS1.0:RSA_3DES_EDE_CBC_SHA1:192",prog="exim.mtail",status="in"} 22 1565525440450
exim_tls_connections{ciphersuite="TLS1.2:DHE_RSA_CHACHA20_POLY1305:256",prog="exim.mtail",status="out"} 17 1565623119407
exim_tls_connections{ciphersuite="TLS1.2:RSA_AES_256_CBC_SHA1:256",prog="exim.mtail",status="in"} 215 1565663261489
exim_tls_connections{ciphersuite="TLS1.1:ECDHE_RSA_AES_128_CBC_SHA1:128",prog="exim.mtail",status="out"} 1 1565675055217
exim_tls_connections{ciphersuite="TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128",prog="exim.mtail",status="in"} 72 1565677145371
exim_tls_connections{ciphersuite="TLS1.1:ECDHE_RSA_AES_256_CBC_SHA1:256",prog="exim.mtail",status="in"} 101 1565682896514

And e.g. the first metric has been indeed updated a few days ago

$ LANG=C date --date=@1565236340
Thu Aug  8 05:52:20 CEST 2019

For certain types of metrics it seem "natural" decay not to be updated, so I'm not 100% sure there's anything to do or even if this is a real error condition

I'm seeing this same issue with the new mailman metrics.

The timestamps issue appears to have been fixed in v3.0.0-rc25. Is it time to upgrade mtail?

I'm seeing this same issue with the new mailman metrics.

The timestamps issue appears to have been fixed in v3.0.0-rc25. Is it time to upgrade mtail?

Sounds good to me! Unfortunately Debian unstable ships rc24 but should be straightforward to package rc25 or greater.

this has been cleaned up with the rc35 upgrade. no more instances since 15/Jun.