Page MenuHomePhabricator

Inconsistent time series data on mediawiki-errors Logstash dashboard
Open, Needs TriagePublicBUG REPORT

Assigned To
None
Authored By
Tgr
Mar 11 2023, 7:57 AM
Referenced Files
F36905913: 90d.png
Mar 11 2023, 7:57 AM
F36905907: 7d.png
Mar 11 2023, 7:57 AM
F36905905: 24h.png
Mar 11 2023, 7:57 AM
F36905911: 30d.png
Mar 11 2023, 7:57 AM
F36905909: 14d.png
Mar 11 2023, 7:57 AM
Subscribers

Description

Here are some screenshots of https://logstash.wikimedia.org/goto/b32de0fd6fa3aca4432fa38f1e0f9e89 (from T328872) with different time range settings:

Last 14 days
14d.png (602×2 px, 96 KB)
Last 30 days
30d.png (592×2 px, 96 KB)

These aren't even remotely consistent. The 14d range shows ~600 events, concentrated in the last 4 days. The 30d range shows ~20 errors (but the aggregate chart still says 600-ish), most of them in the first two weeks. (Shorter time ranges seem consistent with the 14d one: 24h F36905905, 7d F36905907; longer ones consistent with the 30d one: 90d F36905913) What's going on here?

The switch happens (at the time of writing this, anyway) when I move the starting point of the range over Feb 21 15h.

Error time series are important for correlating errors with code or configuration changes, it's bad if we can't have confidence in them.