This is a task to coordinate the release a subset of real user performance data that was collected while conducting the first round of research from this project: https://meta.wikimedia.org/wiki/Research:Study_of_performance_perception_on_Wikimedia_projects Which led to a short paper entitled "A Large-scale Study of Wikipedia Users' Quality of Experience", due to be presented and published at The Web Conference 2019.
I can share the camera-ready version of the paper privately with anyone at Wikimedia who might be interested before its publication, as it might help understand why this specific chunk of data is being requested for publication.
I expect that Analytics, Legal and Security will want to review this dataset. Feel free to create dedicated subtasks for each team.
2018-05-24 12:55:12 -> 2018-10-15 11:59:52
Data was collected on cawiki, frwiki, enwikivoyage and ruwiki. We need at the very least data for ruwiki.
The following have all been collected client-side, via the NavigationTiming extension:
- wiki Which wiki the request was on (ruwiki, cawiki, eswiki, frwiki or enwikivoyage)
- time Timestamp, can be rounded to the minute or the hour if needed. We don't need second accuracy at all. But it's useful in the study to demonstrate like of temporal correlation (time of day, day of week, day of month). Since we don't need the timestamp to be the real one to prove lack of temporal correlation, the timestamp values should be shifted by an arbitrary value for the entire dataset.
- unload  The time spent on unload (unloadEventEnd - unloadEventStart).
- redirecting  Time spent following redirects.
- fetchStart  The time immediately before the user agent starts checking any relevant application caches.
- dnsLookup  Time it took to resolve names (domainLookupEnd - domainLookupStart).
- secureConnectionStart  The time immediately before the user agent starts the handshake process to secure the current connection.
- connectStart  The time immediately before the user agent start establishing the connection to the server to retrieve the document.
- connectEnd  The time immediately after the user agent finishes establishing the connection to the server to retrieve the current document.
- requestStart  The time immediately before the user agent starts requesting the current document from the server, or from relevant application caches or from local resources.
- responseStart  The time immediately after the user agent receives the first byte of the response from the server, or from relevant application caches or from local resources.
- responseEnd  The time immediately after the user agent receives the last byte of the current document or immediately before the transport connection is closed, whichever comes first.
- loadEventStart  The time immediately before the load event of the current document is fired.
- loadEventEnd  The time when the load event of the current document is completed.
- mediawikiLoadEnd Mediawiki-specific. The time at which all ResourceLoader modules for this page have completed loading and executing.
- domComplete  The time immediately before the user agent sets the current document readiness to "complete".
- domInteractive  The time immediately before the user agent sets the current document readiness to "interactive".
- gaps  The gaps in the Navigation Timing metrics. Calculated by taking the sum of: domainLookupStart - fetchStart, connectStart - domainLookupEnd, requestStart - connectEnd and loadEventStart - domComplete.
- firstPaint  The time when something is first displayed on the screen.
- rsi  RUMSpeedIndex. Estimate of the SpeedIndex value based on ResourceTiming data. Now moved to the RUMSpeedIndex EventLogging schema, but was collected as part of the NavigationTiming schema at the time of the study.
And the following metrics, that are derivatives of metrics coming from NavigationTiming, designed to preserve privacy:
- speed_quantized The page download speed evaluated as (transferSize *8)/(loadEventStart - fetchStart) quantized in these bins = [0,100,200,300,400,500,600, 700, 800,900,1000,20000] (the sensitive metric is transferSize , the size of the gzipped html of the article measured)
- speed_over_median_per_country The page download speed (evaluated as above) normalized over the median per-country speed observed in the dataset.
Finally, the response users gave to the perception survey:
- surveyResponseValue Can be "yes", "no", or "not sure". The question asked being "Did this page load fast enough?".
 metrics coming from the browsers' implementation of the NavigationTiming API (level 1 and level 2).
 firstPaint comes from the Paint Timing API or vendor-specific implementations predating the standards.
 RUMSpeedIndex is a compound metric combining several NavigationTiming and ResourceTiming (level 1 and level 2) metrics into a single score. It's a 3rd-party FLOSS library found here: https://github.com/WPO-Foundation/RUM-SpeedIndex
EventLogging schemas these fields are coming from: