HomePhabricator
Performance perception: how satisfied are Wikipedia users?

We've recently published research on performance perception that we did last year. The micro survey used in this study is still running on multiple Wikipedia languages and gives us insights into perceived performance.

The micro survey simply asks users on Wikipedia articles, in their own language, if they think that the current page loaded fast enough:

Capture_d_écran_2018-04-30_10.09.02.png (478×483 px, 46 KB)

Let's look at the results on Spanish and Russian Wikipedias, where we're collecting the most data. We have collected more than 1.1 million survey responses on Spanish Wikipedia and close to 1 million on Russian Wikipedia so far. The survey is displayed to a small fraction of our visitors.

How satisfied are our visitors with our page load performance?

Capture d'écran 2019-05-29 19.00.21.png (286×1 px, 50 KB)

Ignoring neutral responses ("I'm not sure"), we see that consistently across wikis between 85 and 90% of visitors find that the page loaded fast enough. That's an excellent score, one that we can be proud of. And it makes sense, considering that Wikipedia is one of the fastest websites on the Web.

Now, a very interesting finding is that this satisfaction ratio varies quite a bit depending on whether you're logged into the website, or if like most Wikipedia visitors, you're logged out:

wikistatussample sizesatisfaction ratio
spanishlogged in1,50089.70%
spanishlogged out1,109,20585.82%
russianlogged in7,09392.28%
russianlogged out885,92685.82%

It appears that logged-in users are consistently more satisfied about our performance than logged-out visitors.

The contributor performance penalty

Andres Apevalov — Press team of Prima Vista Literature Festival, CC BY-SA 4.0
Andres Apevalov — Press team of Prima Vista Literature Festival, CC BY-SA 4.0

What's very surprising about logged-in users being more satisfied is that we know for a fact that the logged-in experience is slower. Because our logged-in users have to reach our master datacenter in the US, instead of hitting the cache point of presence closest to them. This is a long-standing technical limitation of our architecture. An issue we intend to resolve one day.

Why could they possibly be happier, then?

The Spanish paradox

Map-Hispanophone_World.png (628×1 px, 32 KB)

Spanish Wikipedia, at first glance, seems to contradict this phenomenon of slower page loads for logged-in users. Looking at the desktop site only (to rule out differences in the mobile/desktop mix):

wikistatusmedian loadEventEnd
spanishlogged in1400.5
spanishlogged out1834
russianlogged in1356
russianlogged out1075

The reason why - contrary to what we see on other wikis and at a global scale - Spanish Wikipedia page loads seem faster for logged-in users, is that Spanish Wikipedia traffic has a very peculiar geographic distribution. Logged-in users are much more likely to be based in Spain (30.04%) than in latin american countries than their logged-out counterparts (22.3%). Since internet connectivity tends to be faster in Spain, this ratio difference explains why the logged-in experience appears to be faster - but isn't - when looking at RUM data at the website level.

This is a very common pitfall of RUM data, where seemingly contradicting results can emerge depending on how you slice the data. RUM data has to be studied from many angles before drawing conclusions.

Caching differences

Looking at the Navigation Timing data we collect for survey respondants, we see that for logged-in users the median connect time on Spanish Wikipedia is 0 and for logged-out users it's 144ms. This means that logged-in users view a lot of pages and the survey mostly ends up being displayed on their nth viewed page, where n is more than 1, because their browser is already connected to our domain. Whereas for a lot of logged-out users, we capture their first page load, with a higher probability of a cold cache. This means that logged-in users, despite having a (potential) latency penalty of connecting to the US, tend to have more cached assets, particularly the JS and CSS needed by the page. This doesn't fully compensate the performance penalty of connecting to a potentially distant datacenter, but it might reduce the variability of performance between page loads.

In order to further confirm this theory, in the future we could try to record information about how much of the JS and CSS was already available in the browser cache and the time the page load happened. This is not information we currently collect. Such data would allow us to confirm whether or not satisfaction is correlated to how well cached dependencies are, regardless of the user's logged-in/logged-out status.

Brand affinity?

Wikilove2.png (600×800 px, 290 KB)

Becoming a Wikipedia contributor - and therefore, logging in - requires a certain affinity to the Wikipedia project. It's possible, as a result, that logged-in users have a more favourable view of Wikipedia than logged-out users on average. And that positive outlook might influence how they judge the performance of the website.

This is a theory we will explore in the future by asking more questions in the micro survey, in order to determine whether or not the user who responds has a positive view of our website in general. This would allow us to quantify how large the effect of brand affinity might be on performance perception.

Written by Gilles on May 29 2019, 5:17 PM.
Engineering Manager, WMF
Projects
Subscribers
Tgr, Quiddity
Tokens
"Love" token, awarded by stjn."Love" token, awarded by Quiddity."Love" token, awarded by CKoerner_WMF.

Event Timeline

Another possible explanation for logged-in editors rating it more highly, as they might be long-time contributors who remember how long it took for large articles to load, many years ago. I remember [[Canada]] used to take ~45 seconds to load. Now it takes 5 seconds! I even went and checked the history of the article's size to see if it used to be much bigger, but no: it used to hover around 130kb, but nowadays it's up to 230kb. Y'all are amazing. <3

@Quiddity do you mean the time it takes to open that article in one of the editor modes? That wasn't captured here, we're only looking at plain article views. Which for logged-in users, while not hitting edge caches, should hit the parser cache and be much faster than 5s nowadays.

As for your main point that editors who have been around for a long time have felt the drastic improvements of performance over the years, we can actually find that out indirectly based on number of edits. Here's the data for Spanish Wikipedia logged-in users since the survey started:

# of editssample sizesatisfaction ratio
0 edits13491.7%
1-4 edits17692.2%
5-99 edits35694.49%
100-999 edits20896.41%
1000+ edits54988.03%

This seems to suggest that indeed, the more experienced the editor, the higher they rate the performance. By witnessing improvements over time and/or having their affinity to the project growing with experience.

While the last bucket (1000+ edits) seems to contradict this pattern, I think it's too coarse a bucket. It might capture extremely active editors, who might get annoyed by the survey because their high activity means that they see it multiple times. On Spanish Wikipedia editors see the micro survey 10 times less frequently than readers, but still, I can imagine that someone very prolific would have seen it multiple times over the course of a few months and could get annoyed by it. It's still significantly higher than for logged-out users (85.82%).

@Gilles Nope, I meant just opening a very large article to read.
Years ago (2010 era? maybe earlier or later), on my outdated hardware, it often took 30-45 seconds to fully load a very-large article/page to read; although I've always had a profusion of gadgets and userscripts installed which I know adds to the load-time.
Now it takes 4.1 seconds to load the Obama article, which is amazing! F29280932 - I tried opening a handful of links in https://en.wikipedia.org/wiki/Special:LongPages and most of those are even faster, at 2-3 seconds.
I know (thanks to Krinkle) that there are other changes which have contributed to this such as better global networks and better internal routing configurations, plus my own upgraded hardware, and better browsers, but still. Much Kudos to you all. :-)

Cool research!

The "Spanish paradox" is called Simpson's paradox, FWIW.