- Affected metrics
- RUM Median story
- RUM p95 story
- WebPageTest story
- ResourceLoader traffic affected
NavigationTiming metrics around September 8th:
(Links to Grafana viewing from 20160801 to 20160918)
|domInteractive (From navigationStart to document received, parsed, and blocking resources ready)||Median: -50ms (from 870ms to 820ms)||-|
|p95: -6% (4.5s to 4.2s)||-|
|firstPaint||Median: unaffected (~1000ms)|
|p95: +441% (from 4.9s to 21.6s)|
|domComplete||Median: -20% (from 1900ms to 1500ms)|
|p95: -8% (from 9.4s to 8.7s)||-|
|loadEventEnd||Median: -25% (from 2000ms to 1600ms)|
|p95: -10% (from 10s to 9s)||-|
|mediaWikiLoadComplete (Time startup.js loaded execution and NavTiming module being loaded)||Median: +400ms (from 900ms to 1300ms)|
|p95: +100% (from 5s to 10s)|
RUM Median story
For the median use case it seems like events mostly re-ordered, in a way that ultimately actually makes the page finish loading quicker. Assuming that this is caused by "resourceloader: Make cache-eval asynchronous ", it happened by letting the browser prioritise getting the page ready, on-screen and interactive; in favour of spreading out module execution.
Here too overall page load is faster. But JS execution is spread out even more. This could be due to various reasons. The most likely reasons is a combination of the following three reasons.
- requestIdleCallback in chrome causing a delay upto 30s (fixed in master)
- setTimeout being deprioritized in background tabs (probably acceptable and desirable; not sure if a significant amount of page views initialise in a background tab and/or pre-render).
- Cache-eval goes in chunks of 50ms. On slower devices (e.g. mobile) will require a lot of chunks, compared to the previous approach of executing everything in one go. However the downside is that this will render the user's browser unresponsive for a long time - which is arguably a much worse user experience. To be confirmed.
(user: Anonymous, project: enwiki, page: Facebook)
Desktop first view:
- SpeedIndex: Unaffected.
- Start render:
- us-east1/Firefox: Consistently hitting the low-end more frequently (range unchanged).
- us-east1/Chrome: Range is consistently 100ms lower.
- us-east1/IE: Unchanged.
- Fully loaded:
- us-east1/Firefox: Unchanged.
- us-east1/Chrome: Consistently 100ms lower.
- us-east1/IE: Unchanged.
- Firefox: Unchanged.
- Chrome: Consistently ends 100ms earlier.
- IE: No clear change.
Desktop repeat view (Chrome only):
- SpeedIndex: 1.2s -> 800ms.
- Start render: 790ms -> 570ms.
- Fully loaded: 2.6s -> 2.3s.
Real Mobile (Motorola G, 3G-Fast):
- SpeedIndex: About 200 lower on average
- fullyLoaded: No clear change.
- mwLoadStart/End: No clear change.
- render: No clear change.
Second page view (Chrome, emulateMobile):
- SpeedIndex: -100 (2014 -> 1930)
- render: -100 (1990 -> 1891)
- fullyLoaded: +300ms (2.1s -> 2.4s)
- mwLoadStart: +/-0 (210ms)
- mwLoadEnd: +200ms (2.4s -> 2.6s)
ResourceLoader traffic affected
In addition, it appears ResourceLoader traffic was also affected in interesting ways.
On September 8 between 18:45 and 19:15 there was a very sudden increase in total traffic to load.php. Most of these extra requests had an If-None-Match header, and were responded to with HTTP 200 and a short cache-expiry.
Log entries confirm that this is indeed also caused by the async cache-eval change:
19:02 logmsgbot: demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.18