Split off from T155639.
More details on this work are documented here and here.
E.g.
- Having found that there are sometimes page tokens (pageviews) with > 2 events, check that at least there is only one pageloaded event among them each time
There are 0.0111% of events where page loaded happened more than once.
- Check if such pageviews (tokens) with multiple unloaded events will need to be filtered out during analysis, or whether their impact is likely small enough to be ignored
0.311% of pageviews have more than one unloaded events. We are filtering them out in the Reading time project.
- Consistency checks: totalLength and visibleLength should be less (apart from rounding errors) than the difference between timestamps of the loaded and unloaded events (@Zareenf already worked on these)
We have about 7% of cases where the lengths these are more than 2 seconds greater than the difference between the time stamps and about 1.5% of cases at more than 5 seconds apart.
We also observe periodicity in the distribution errors with a periodicity of about 40 seconds. We are investigating this further
- Generate histogram for totalLength
- Generate histograms for visibleLength
Here's a plot with of the distribution of the lengths (logged)
- Investigate cause of periodicity in discrepancies between total/visibleLength and event timestamps.
This was an artifact of a bug parsing datetimes.
This plot shows the distribution of discrepancies with the bug fixed.
- Investigate cause of total/visibleLength values that are negative or very large.
We did not see any patterns of negative values. However, we do observe a still-unexplained periodicity in the frequency of negative .
Only 0.0019% of events have a negative totallength and 0.010% of events have a negative visible length.
Only 2.77% of events have a total length greater than 1 hour, and 1.03% of events have a visible length greater than 1 hour.
Only 0.189% of events have a total length greater than 12 hours, and 0.460% of events have a visible length greater than 12 hours.
As these number are quite small, especially in the negative cases, I do not feel that urgent investigation of this matter is urgent.
(See also notes)