Page MenuHomePhabricator

Investigate how connection speed varies by country
Closed, ResolvedPublic

Event Timeline

Tbayer claimed this task.
Tbayer raised the priority of this task from to Needs Triage.
Tbayer updated the task description. (Show Details)

Below is a list of how speed varied by country (measued as the median of loadEventEnd from the NavigationTiming schema), during the five weeks from Dec 23-Jan 26 . The global median was (EDIT:) 1703 milliseconds during this time (1592 milliseconds during January 27).

countrymedian of loadEventEnd (in ms)# of samples
LI898.546
CH108414623
AX109397
DK111722799
SE111740730
PM11255
GG1141.5358
LU1137.51432
VA114113
MC124080
NO117422810
NL117647814
BE122220208
JE1251607
GI1258247
IS1296.52786
FO1348.5276
AD135488
DE134974946
FI135123215
CA1357250841
GB1360535097
KP13633
AT136511504
IM1383.5640
CZ1405.514800
SI1412.57950
US14121940015
LT14156350
EE14235503
EU14363380
LV14504099
HU146215524
SM147115
BM1477.5310
BG148712137
RU151026957
SK15287439
IL154418647
FR154443388
RO154522943
PT155715301
PL157132921
MT1579.52994
UA16129675
HR165514098
ES167133193
ST2207.54
IT173839487
RS173917211
AW1765300
GE17683927
MD1769.51438
JP177821061
BY17991928
GR180725796
IE180937988
KY1902302
CW1896338
VI1908246
KR191615737
MK19254775
PR19284116
BA19345425
MX197226769
TW1996.511924
PA20271679
AE203322980
ME20351282
HK210119492
AM21141486
KZ21511816
QA21945036
BS2210932
AL22114187
MF239914
AI221839
NZ221927147
MO2230551
TR2234.525154
CR22642414
PF228475
DO22972338
PE23245165
AZ2325.52072
SG234837959
BB2373.5898
CY23802559
CO23987156
BQ239841
KW24003770
GL2430.560
SV2420.5934
TT24193093
GT24221245
GU2439.5530
TC2451.584
UY24481539
AU2454142711
MN24611285
CL25097082
BH2521.52248
DM2542.5172
SA2541.514716
BR255138881
VG2589108
MQ265496
KN2616125
EC2624.52334
GW26587
AR266911924
PS2669649
SX268993
GD2790.5192
JO27862534
HN2812.5958
CK2930.520
VN284810189
OM2859.52520
NI2865541
CN286518987
TN2872.51770
RE2905185
MA29073831
TH291315979
A13035.5258
CV301955
NC3061.558
WS347418
KG3191339
IQ32142463
LC3219281
GF323141
AP32341609
TO3374.524
VE32834369
SR3290462
GP3291139
LB3295.53680
JM33762891
VC3399115
MY340341040
AG3444238
BN34561156
DZ35662666
EG35748886
BZ3638.5410
GY3664526
TJ366577
LK36696796
BO3822696
ER9784.54
PY3861690
MS38795
MP3940115
ID400534620
PH403474659
AS409650
KH40441401
ZA404922840
CI4064180
YT411111
UZ4123451
IR420411935
FJ4251.5566
HT4279191
SC4307117
RW4347281
MZ4412267
GQ44367
AO4490558
ML463260
PW459016
MU45902132
BD459210430
MV4607749
LA4655.5268
SN4670259
AF4717463
PK4730.530474
DJ4744169
KE47964651
GA4930.542
IN4835319293
BT4910169
MG5040154
VU499629
ET5046.51100
CD5081109
LY5106447
NA5263491
ZW5334795
TM538281
SL580254
CG570233
KM57233
UG57351074
SY5779388
SB579439
SZ5909.544
MH592917
NG59308362
MM5996.51820
GH6112.52456
FM627326
PG6216213
TZ6353.52110
MR639145
CM6573369
ZM6620644
BI663227
BW6753.5502
NP67943341
LR7133.5146
TL703051
TG745041
MW7501203
BJ751979
IO75577
SO7597343
YE7655285
CF8667.54
GM771981
SD7979639
CU8173.5138
BF827239
LS878875
FK899119
GN926533
TD95739
SS982027
NU437722
A21056898
NE10827.530
NR153721
KI238775
WF1061611

Queries used (BTW the method to calculate medians is adapted from JonK:

SET @rownum = 0
SELECT AVG(loadEventEnd) AS median_loadEventEnd FROM
  (SELECT (@rownum:= @rownum +1) AS row_number, loadEventEnd FROM (
    SELECT event_loadEventEnd AS loadEventEnd FROM log.NavigationTiming_15033442 WHERE timestamp LIKE '20160127%' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' ORDER BY loadEventEnd) AS unnumberedlist
  ORDER BY loadEventEnd) AS numberedlist
INNER JOIN
  (SELECT COUNT(*) AS total_rows FROM log.NavigationTiming_15033442 WHERE timestamp LIKE '20160127%' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki') AS row_counting
ON 1=1
WHERE
numberedlist.row_number >= row_counting.total_rows/2 AND
numberedlist.row_number <= row_counting.total_rows/2+1

SET @rownum = 0
SELECT numberedlist.country, AVG(numberedlist.loadEventEnd) AS median_loadEventEnd, AVG(rows_per_country.total_rows) AS samples FROM
  (SELECT IF(@prev <> ordered_data.country, @rownum:=1, @rownum:= @rownum +1) AS row_number, ordered_data.loadEventEnd, @prev:= ordered_data.country, ordered_data.country AS country FROM (
       SELECT event_loadEventEnd AS loadEventEnd, event_originCountry AS country
       FROM log.NavigationTiming_15033442
       WHERE timestamp BETWEEN '20151223' AND '20160127' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki'
       ORDER BY country, loadEventEnd) ordered_data
  ORDER BY country, loadEventEnd) AS numberedlist
INNER JOIN
  (SELECT COUNT(*) AS total_rows, event_originCountry AS country FROM log.NavigationTiming_15033442
  WHERE timestamp BETWEEN '20151223' AND '20160127' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' GROUP BY country)
  AS rows_per_country
ON numberedlist.country = rows_per_country.country
WHERE
numberedlist.row_number >= rows_per_country.total_rows/2 AND
numberedlist.row_number <= rows_per_country.total_rows/2+1
GROUP BY numberedlist.country
ORDER BY numberedlist.loadEventEnd

(edited to add:)
There are some sorting errors in the list (e.g. CF should be further down), also for the mobile version below. The reason is not clear to me; for now I'm assuming that the values are correct and MySQL/MariaDB just messed up the last ORDER BY.

(edited to add:)

SELECT AVG(loadEventEnd) AS median_loadEventEnd FROM
  (SELECT (@rownum:= @rownum +1) AS row_number, loadEventEnd FROM (
    SELECT event_loadEventEnd AS loadEventEnd FROM log.NavigationTiming_15033442 WHERE timestamp BETWEEN '20151223' AND '20160127' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' ORDER BY loadEventEnd) AS unnumberedlist
  ORDER BY loadEventEnd) AS numberedlist
INNER JOIN
  (SELECT COUNT(*) AS total_rows FROM log.NavigationTiming_15033442 WHERE timestamp BETWEEN '20151223' AND '20160127' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki') AS row_counting
ON 1=1
WHERE
numberedlist.row_number >= row_counting.total_rows/2 AND
numberedlist.row_number <= row_counting.total_rows/2+1

median_loadEventEnd
1703.0000

...and here is a chart showing the distribution of loadEventEnd globally and for some of the slowest and fastest countries:

Wikimedia load time distribution - CH, US, IN, GH, NP, global (2015-12-23..2016-01-26).png (522×1 px, 45 KB)

(with some slight rounding errors due to sampling and presentation)

...and here is a chart showing the distribution of loadEventEnd globally and for some of the slowest and fastest countries:

Wikimedia load time distribution - CH, US, IN, GH, NP, global (2015-12-23..2016-01-26).png (522×1 px, 45 KB)

(with some slight rounding errors due to sampling and presentation)

Thanks, Tilman. For my own edification, what tool did you use to create the chart? It looks like Google Charts, but I am not sure. It is clear and crisp.

Thanks, Tilman. For my own edification, what tool did you use to create the chart? It looks like Google Charts, but I am not sure. It is clear and crisp.

Sorry, meant to add the details for this one too: The spreadsheet with the chart and the source data is here (you can change the country names in the green cells to generate the same chart for a different set of countries). Data obtained via:

SELECT FLOOR(LOG2(event_loadEventEnd/1000)*2)/2 AS bucket, event_originCountry AS country, COUNT(*) FROM log.NavigationTiming_15033442 WHERE timestamp BETWEEN '20151223' AND '20160127' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' GROUP BY bucket, country ORDER BY bucket, country

I restricted it to enwiki per your rationale at T119797#1853006 , but not yet to the mobile site - starting the queries for that now, too.

And here is the analogous data and chart restricted to the mobile site (based on seven weeks' worth of data, i.e. two more than above):

loadEventEnd on en.m.wikipedia.org (during Dec 22, 2015 to Feb 8, 2016):

Global median: 2310ms

Chart showing the distribution of loadEventEnd globally and for some of the slowest and fastest countries:

Wikimedia load time distribution on mobile - CH, US, IN, GH, NP, global (2015-12-22..2016-02-08).png.png (522×1 px, 48 KB)

Median speed by country:

countrymedian_loadEventEndsamples
PM5981
LI108614
GG1325158
MC1523.528
DK15388144
NO15599682
SE156116285
GI1567103
JE1574.5316
CH16225239
LU1652518
ST67632
GB1738313000
IM1793354
NL183817331
US18711215635
IS1879925
CA1905119779
FO191287
AW1983172
BE21235447
FI21929177
BM2277.5126
DE221025235
LV22291075
AD2291.538
AT22764135
GL2370.530
LT22951601
SI23172403
EE23341423
PF235325
GW36014
CK23739
FR2377.511928
PR23872580
JP23987123
IE240224651
HU24083883
KY2432.5174
VI2436127
IL24776123
CZ2502.53066
SK25332051
AI2731.518
MT2582.5936
PT26153756
BS2624591
RO26247121
AE262512059
ES263312031
MD2639.5358
MX264613211
BG2655.53156
UA26732210
IT268715971
HR2687.54930
PL27277512
SM27371
RU2763.56082
CW2849.5198
GE2854871
RS28664915
KR28685996
KW28852464
GR28887430
BA29021850
QA29132681
AX296228
BH29431547
SX298647
ME2988.5550
AL29872034
NZ298913390
SG299120477
DO30111261
TC311752
HK30616804
BY3095440
MK3113.51666
AU311780976
TW31203432
PA3169912
GU3174305
AZ3227.51032
MO3263.5204
MN3268.5388
AM3282468
CY33031041
TR33718794
CR33841286
SA33778524
KN338353
BB3393519
CO3440.53150
PS3511242
GT3480663
MQ3547.542
TT35001655
EC3539874
EU356077
HN3580500
CV5532.520
JO36251456
SV3634462
KZ3650766
VG367859
CL36982352
MF37379
BR378212098
PE37991633
MP3850.554
NC419614
IQ3840.51594
NI3848265
SR3875.5342
MS43764
OM39101569
LB39982392
RE400759
GP412168
MA4120.51648
LC4136.5150
GD421684
DM414475
AS427618
UY4160547
BQ447316
VC420775
GF437316
AR42893728
TH43727043
BN4398741
AG4458.5176
TN4484.5514
VN44913249
BZ4522215
MY4526.523834
PW45303
EG4637.54092
KG4666198
SC469556
JM47371799
CN47605650
TO47955
TJ506244
VE49861290
SL5235.526
BO5121367
ZA516216995
ID5172.521464
HT5235113
MZ5242165
FM57146
YT62284
LK53103710
DZ5318856
PY5357335
AO5372437
GY5496283
MV5621459
MG562659
KH5642793
A1584896
BT5727128
CG570215
LA5755169
PH576530807
IR5773.55192
MU57821051
SB5887.524
GQ12629.52
PK595717408
FJ6049391
WS60807
FK8118.58
CI626397
UZ6370.5210
CU648912
LY6471193
AF6568242
MM65531685
KE6607.52946
CD664955
ZW6786335
DJ680953
BD68465272
IN6976194134
AP748596
PG7196.5168
CM7246209
TL725427
NG72827258
ML732129
SZ7468.532
SY7387237
NA7427296
SN7478122
RW7533.5130
GH75461880
IO75533
ZM7579445
MR799920
GA7673.524
NP7692.51904
BJ787360
TZ77881581
UG7792667
TM789152
ET7957352
TD8888.56
MW8423169
BW8450.5328
GM858852
VU98958
BF886425
LR894299
SO9029156
BI903815
SD9101544
YE9412208
NE946021
SS10196.520
GN992225
MH99639
NU101341
LS1075959
KI177032
TG1352414
A21267526

(I'll see to make this into a nice color-coded world map.)

Queries used:

Global median:

SET @rownum = 0;
SELECT AVG(loadEventEnd) AS median_loadEventEnd FROM
  (SELECT (@rownum:= @rownum +1) AS row_number, loadEventEnd FROM (
    SELECT event_loadEventEnd AS loadEventEnd FROM log.NavigationTiming_15033442 WHERE timestamp BETWEEN '20151222' AND '20160209' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' AND event_mobileMode IS NOT NULL ORDER BY loadEventEnd) AS unnumberedlist
  ORDER BY loadEventEnd) AS numberedlist
INNER JOIN
  (SELECT COUNT(*) AS total_rows FROM log.NavigationTiming_15033442 WHERE timestamp BETWEEN '20151222' AND '20160209' AND event_mobileMode IS NOT NULL AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki') AS row_counting
ON 1=1
WHERE
numberedlist.row_number >= row_counting.total_rows/2 AND
numberedlist.row_number <= row_counting.total_rows/2+1;

Median per country:

SET @prev ='init';
SET @rownum = 0;
SELECT numberedlist.country, AVG(numberedlist.loadEventEnd) AS median_loadEventEnd, AVG(rows_per_country.total_rows) AS samples FROM
  (SELECT IF(@prev <> ordered_data.country, @rownum:=1, @rownum:= @rownum +1) AS row_number, ordered_data.loadEventEnd, @prev:= ordered_data.country, ordered_data.country AS country FROM (
       SELECT event_loadEventEnd AS loadEventEnd, event_originCountry AS country
       FROM log.NavigationTiming_15033442
       WHERE timestamp BETWEEN '20151222' AND '20160209' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' AND event_mobileMode IS NOT NULL
       ORDER BY country, loadEventEnd) ordered_data
  ORDER BY country, loadEventEnd) AS numberedlist
INNER JOIN
  (SELECT COUNT(*) AS total_rows, event_originCountry AS country FROM log.NavigationTiming_15033442
  WHERE timestamp BETWEEN '20151222' AND '20160209' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' AND event_mobileMode IS NOT NULL GROUP BY country)
  AS rows_per_country
ON numberedlist.country = rows_per_country.country
WHERE
numberedlist.row_number >= rows_per_country.total_rows/2 AND
numberedlist.row_number <= rows_per_country.total_rows/2+1
GROUP BY numberedlist.country
ORDER BY numberedlist.loadEventEnd;

Distribution per country:

SELECT FLOOR(LOG2(event_loadEventEnd/1000)*2)/2 AS bucket, event_originCountry AS country, COUNT(*) FROM log.NavigationTiming_15033442 WHERE timestamp BETWEEN '20151222' AND '20160209' AND event_loadEventEnd IS NOT NULL AND wiki = 'enwiki' AND event_mobileMode IS NOT NULL GROUP BY bucket, country ORDER BY bucket, country

The spreadsheet with the source data and chart is here, again with an easily changeable country selection.

To recap some of the off-Phab discussion about this:

There are various other interesting directions one can explore about this data from here on. We should figure out what the most important questions are for the purpose of informing actual product decisions. One early thought was to quantify how much connection speed varies among readers in a particular (or in the average) country (using a suitable dispersion measure, e.g. median absolute deviation), but that doesn't seem too relevant right now. For pursuing the idea of using geolocation information to decide whether to serve a version optimized for slow connections, one could define a threshold speed, above which we assume that that version is the better choice for the reader; we could then quickly find all countries where (say) more than 90% of pageviews occur with a slower speed. Etc.

My 2 cents:

If you want to evaluate actual loading speeds you also need to take a look at the 90th percentile as performance-wise you want to examine the loading event if most of resources come from cache (median) and if is a complete fresh request (90th percentile). The latency distribution is bimodal.

Without those two numbers you cannot know if improvements in performance are truly effective. Or even, what performance really is.

My 2 cents:

If you want to evaluate actual loading speeds you also need to take a look at the 90th percentile as performance-wise you want to examine the loading event if most of resources come from cache (median) and if is a complete fresh request (90th percentile). The latency distribution is bimodal.

Without those two numbers you cannot know if improvements in performance are truly effective. Or even, what performance really is.

Thanks Nuria - I already read your thoughtful and interesting Wikitech notes, including the observation that "latency is bimodal"before running these queries. It is however my understanding that on Wikimedia sites, considerably more than 90% of pageviews are cached (96-98% at one point in late 2014), i.e. that this assumption about the 90th percentile is not true on our case. The distribution graphs above appear to confirm this: while that second uncached mode may surely exist, it is not clearly discernable globally and for these example countries, at least not at these resolutions and sample sizes. Anyway, I'll let the performance team weigh in, who have the most experience in measuring this kind of thing and have been relying on medians quite frequently.

@Tbayer: Really Nice Maps! Would love a link to the talk if any.

It is however my understanding that on Wikimedia sites, considerably more than 90% of pageviews are cached (96-98% at one point in late 2014), i.e. that this assumption about the 90th percentile is not true on our case

This is incorrect, I am talking about client side caching. not server side. Server side you are correct, our requests are cached to a high degree. Client side no, our users request data (just like any web user) with a mix of full and cold caches. Think that every time we deploy a new javascript version of a file every user needs to fetch it and use it until we update it again. This fetching times are going to show on a 90th percentile, not a median. Let me know if this makes sense.

Okay to close this task?

Yes, from my perspective it was all done some months ago ;)
@ori , I assume you don't need anything else here currently? Happy to do followup analyses as needed. (There have been some other uses for this data since, e.g. here.)

@Tbayer: Really Nice Maps! Would love a link to the talk if any.

Thanks! The complete slides are here in case they didn't cross your path in the meantime: https://commons.wikimedia.org/wiki/File:New_readership_data_(Wikimedia_Foundation_Tech_Talk).pdf

It is however my understanding that on Wikimedia sites, considerably more than 90% of pageviews are cached (96-98% at one point in late 2014), i.e. that this assumption about the 90th percentile is not true on our case

This is incorrect, I am talking about client side caching. not server side. Server side you are correct, our requests are cached to a high degree. Client side no, our users request data (just like any web user) with a mix of full and cold caches. Think that every time we deploy a new javascript version of a file every user needs to fetch it and use it until we update it again. This fetching times are going to show on a 90th percentile, not a median. Let me know if this makes sense.

The percentage quoted was actually correct ;) but yes, I was talking about server-side caching, and I agree that client-side caching should be taken into account too. That however also means that the "Latency is bimodal" claim was too simplistic in another sense as well (two different kind of caching would correspond to four modes instead of two).
In any case, as we have seen here, it is at odds with the actual data from the English Wikipedia. I have left a comment on the talk page of the Wikitech documentation page: https://wikitech.wikimedia.org/wiki/Talk:Performance/Analytics#.22Latency_is_bimodal.22.2C_and_caching . As before, I guess that the performance team may have other valuable insights.

That however also means that the "Latency is bimodal" claim was too simplistic in another sense as well (two different kind of caching would correspond to four modes >instead of two).

Given the percentage of our backend caching (i.e. we end up serving mostly from varnish >90% of the time) the signal you are most likely to see is the bimodality due to frontend caching. So, no, you would not see four modes easily in our data. In enwiki you are bundling up all users all countries and thus the bimodality is affected by connection speeds. Makes sense? That doesn't mean that the median is a good representation overall of our user base. Reduce your analysis to users with similar connections and locations and you will see the difference. These are well known issues that affect any website deployed globally, not just enwiki.

See: https://www.igvita.com/2012/04/04/measuring-site-speed-with-navigation-timing/

Krinkle triaged this task as Medium priority.
Krinkle subscribed.

Marking as Resolved. @Tbayer Thanks for the data! This is now on Radar/Tracking in both our workboards, so nothing further is happening.