Here are some screenshots of https://logstash.wikimedia.org/goto/b32de0fd6fa3aca4432fa38f1e0f9e89 (from T328872) with different time range settings:
Last 14 days | |
Last 30 days | |
These aren't even remotely consistent. The 14d range shows ~600 events, concentrated in the last 4 days. The 30d range shows ~20 errors (but the aggregate chart still says 600-ish), most of them in the first two weeks. (Shorter time ranges seem consistent with the 14d one: 24h F36905905, 7d F36905907; longer ones consistent with the 30d one: 90d F36905913) What's going on here?
The switch happens (at the time of writing this, anyway) when I move the starting point of the range over Feb 21 15h.
Error time series are important for correlating errors with code or configuration changes, it's bad if we can't have confidence in them.