We have one URL that stands out in our tests (both on WebPageTest and Browsertime/WebPageReplay) as more unstable in metrics than the rest of the URLs. This is most of a problem for the synthetic testing we are doing with Firefox.
Here we test the URL five runs with WebPageTest and the start render time varies between 3100 - 3900ms
http://wpt.wmftest.org/result/180328_WA_PB/
Running for example the Barack Obama page is between 3900-4000ms (http://wpt.wmftest.org/result/180328_V0_R6/).
We have the exact same situation with Browsertime/WebPageReplay even after we did the "trick" by enabling the MOZ log that makes FF slower and for other URLs makes the page more stable. With the log turned on the first visual change goes between 3166 - 3600 (even though we use WebPageReplay).
Checking the HAR (I'll enabling HARs again and add it to the issue) the waterfall looks pretty much the same, so we need to dig deeper. One thing I've seen with Firefox is that DOMInteractive seems to follow the same pattern (low DOMInteractive, low first visual change) but that is not true for Sweden.