Our current process for refining data from EventLogging into Hadoop infers the type for each field instead of looking at the schema. As we discussed in the issue with NavigationTiming data, we're working on a longer term fix for this. But in the meantime we audited where JSON schemas differ from Hive schemas and found that the ServerTiming schema wants duration to be a number, but usually sends a value of 0. Is this field supposed to be a double, is it always an integer, is there a bug in instrumentation? Let us know, we can alter the Hive table and re-import if necessary to recover any data truncated.
|mediawiki/extensions/NavigationTiming||master||+2 -2||Assume that Server Timing duration is expressed in ms|
This is presumably similar to the issue with NavigationTiming/deviceMemory from T214384 (which has since been resolved). And we need to decide whether Analytics will apply the same fix to ServerTiming/duration.
I suppose it's harmless to do, but I don't know whether fractions can/should actually be used for this as well. The unit is in milliseconds, and the spec (and MDN doc) says its parsed as a double from the corresponding HTTP header, so it certainly can have a fraction.
Currently we're only using Server-Timing to pass Varnish caching information, which doesn't use the duration field.
Server-Timing is a freeform thing we can use. The spec that it's a double: https://w3c.github.io/server-timing/#duration-attribute but I doesn't state that the unit is milliseconds. It can be whatever we want, and doesn't have to be a "duration" per se. Server-Timing is designed to allow us to pass anything we want.
We can certainly multiply and round what's passed, for the sake of consistency with other performance APIs and to turn a double into an integer. It's a good time to make that change, since the "duration" field hasn't been used yet.
As for Hive storage, let's consider that this is always an integer. I'll make the necessary change in NavigationTiming for future use (multiply by 1000 and round).