The network transfer and network round-trip time metrics are fairly stable in CI.
But, the processing time metrics (loadEventEnd, firstPaint, etc.) have a fair bit of variance, causing quite a lot of false negatives. I feel like this has gotten worse in recent weeks, possibly since the PHP 7.2 upgrade, which may've also changed other aspects of the base image (Chromium version, Debian minor version patches etc.)
Ideas:
- T223977: Use Mann-Whitney U test to compare sampled times in Fresnel
- Make it less sensitive (e.g. at least 5 milliseconds before warning, or 40% increase).
- Disable threshold (so, make it only show in the info-diff table, without becoming voting for the exit code in red text).
Disabling might be the best given that these metrics currently capture a very wide range of processing activities, including much idle time for other OS/Network/Disk interaction.
Once we have T221179, we will have smaller buckets of time metrics that capture only the processing of our own JavaScript code which strict start and end times for those sequences of (mostly) dedicated CPU time. That will be less prone to variance, and also have the benefit of telling you exactly where the regression was found (and whether it thus could be related to your change).
See also: