Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | Feature | None | T22079 Provide a better means of status update delivery in WMF error message | ||
Open | None | T202061 Implement an accurate and easy to understand status page for all wikis | |||
Resolved | CDanis | T285569 Automated uploads of minimal & comprehensible timeseries metrics for statuspage display | |||
Resolved | CDanis | T298619 "User-reported connectivity errors" (NEL data) not being posted to statuspage since 1 Jan 00:00 UTC |
Event Timeline
Here's the PromQL query that statograph runs to scrape data: link
Looking at that in grafana explore, data is missing for exactly 2022-01-01 00:00 UTC until 2022-01-03 00:00 UTC:
The data in Prometheus comes from an exporter that exports the results of an elasticsearch query. That's configured here. Of particular note is the QueryIndices stanza that tells the exporter to query, for a given time, the index corresponding to a certain year.week.
Looking at the data in Logstash directly, it seems that NEL data for 2022-01-01 00:00 UTC until 2022-01-03 00:00 UTC was stored in the Elasticsearch index named w3creportingapi-1.0.0-2-2022.52. i.e. week 52 of 2022.
This is apparently something that others have tripped over in the past, as it has to do with the horrifying mess that is ISO week date numbers: https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/541#issuecomment-270973321
So it seems that we need to use xxxx instead of YYYY in our index specification, and that we need to also make es_exporter understand 'weekyears'...
There's also a separate issue here, which is that statograph is getting stuck on the interval where data is missing and still not uploading more. That needs investigation as well.
Per the linked upstream issue, Logstash uses Joda which uses this pattern syntax.
QueryIndices in es-exporter configurations use date math support in index names, an ElasticSearch feature. ElasticSearch uses the Java included DateTimeFormatter pattern syntax.
It seems we need two things for weekly indexes:
- Logstash should output weekly indexes to xxxx.ww suffix
- es-exporter should query the YYYY.ww suffix
Change 751765 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] logstash: update weekly indexes to use weekyear pattern syntax
Change 751766 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] prometheus: update affected es-exporter configs to use weekyear
Index curation is affected as well because python's datetime formatter doesn't know weekyear in the same way. We ought to consider using curation based on field_stats or creation_date.
Change 751765 merged by Cwhite:
[operations/puppet@production] logstash: update weekly indexes to use weekyear pattern syntax
Change 751766 merged by Cwhite:
[operations/puppet@production] prometheus: update affected es-exporter configs to use weekyear
Change 756041 had a related patch set uploaded (by CDanis; author: CDanis):
[operations/software/statograph@master] Add a start_timestamp constraint
It took just a single run of statograph -v upload_metrics -t 2022-01-03T00:00Z 1vzzyvjxzgsf to restore things to a good state -- once the most_recent_data_at timestamp had advanced past the missing data, automatic uploads worked again.
The gap from 01 Jan -- 03 Jan will soon rotate out of visibility on the public page, so I'm leaving it rather than doing more work to correct it.
Change 756041 merged by jenkins-bot:
[operations/software/statograph@master] Add a start_timestamp constraint