The UI timestamp in Kibana should be based on source not Logstash intake (for PHP-FPM and MW)
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Krinkle
	Jun 15 2020, 11:07 PM

Description

There was a minor incident today causing the Kafka-to-Logstash intake to get clogged for a few hours. It confirmed to me a suspicion I've held which is that the UI timestamp (used for bar charts and message feed sorting), seems to be based on Logstash intake, not based on message dispatch from the source application.

I thought for a while that this is perhaps because the source timestamp is hard to obtain or at least something we don't have right now. However I think I understand now that we do:

Screenshot 2020-06-16 at 00.00.35.png (1×2 px, 376 KB)

It seems @timestamp is the one used for time slice queries, UI charts, sorting, etc. and timestamp (no @) is the unused internal field for debug purposes, indicating when the message was dispatched in or to rsyslog, I think?

A similar issue exists with the Prometheus metrics we have for Logstash, however once understood, I think those are much easier to work with as-is. It also logically makes sense that you can't/shouldn't backdate or mutate such metrics. They are simply about the intake.

For Kibana and develop UX though, this intake information is imho the internal/debugging one, not the other way around. Could we flip this around for PHP/MW? Or perhaps for everything if people don't mind?

Alternatively, if we want to ensure a consistent meaning for the non-@ internal timestamp as being original application dispatch, perhaps we could make them both the same for PHP/MW, and if we want to store the Logstash intake time, put that in a separate field.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T189333 Changing Kibana filters is ridiculously slow
Open	colewhite	T234565 Standardize the logging format
Open	None	T255508 The UI timestamp in Kibana should be based on source not Logstash intake (for PHP-FPM and MW)

Event Timeline

Krinkle created this task.Jun 15 2020, 11:07 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 15 2020, 11:07 PM

Krinkle added a project: Wikimedia-Logstash.Jun 15 2020, 11:08 PM

Another example of where this leads to confusion:

Screenshot 2020-06-16 at 00.00.37.png (566×1 px, 95 KB)

Based on this graph, I would suspect the error to be due to some kind of automated job given it a regular interval and consisteny-height spikes. In actuality, these just reflect the processing cadence into Logstash, thus making the graph useless, even after waiting for intake delay.

It also wrongly gives the impression that data for 5 min ago is available, when in fact it is not (yet).

This sounds a lot like something we identified during the audit phase of T234565: a number of fields are created (and ultimately passed through transparently) that have essentially the same data, just different keys. IIRC, we want to consolidate on the source's timestamp and only provide our own if one is not available.

fgiunchedi moved this task from Inbox to Backlog on the observability board.Jul 20 2020, 12:58 PM

lmata edited projects, added SRE Observability; removed observability.Jul 12 2021, 2:21 AM

Maintenance_bot added a project: observability.Jul 12 2021, 2:45 AM

lmata moved this task from Inbox to Backlog on the SRE Observability board.Jul 15 2021, 4:09 AM

lmata edited projects, added Observability-Logging; removed SRE Observability.Aug 9 2021, 3:26 AM

Maintenance_bot edited projects, added SRE Observability; removed Observability-Logging.Aug 9 2021, 3:46 AM

lmata edited projects, added Observability-Logging; removed SRE Observability.Jan 17 2022, 11:25 PM

colewhite added a parent task: T234565: Standardize the logging format.Jun 15 2023, 9:55 PM

colewhite moved this task from Inbox to Blocked on the Observability-Logging board.

	F31867288: Screenshot 2020-06-16 at 00.00.37.png
	Jun 15 2020, 11:10 PM

	F31867285: Screenshot 2020-06-16 at 00.00.35.png
	Jun 15 2020, 11:07 PM

The UI timestamp in Kibana should be based on source not Logstash intake (for PHP-FPM and MW)Open, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

The UI timestamp in Kibana should be based on source not Logstash intake (for PHP-FPM and MW)
Open, Needs TriagePublic
Actions

Related Objects
Search...