Page MenuHomePhabricator

Ideas for performance perception studies
Closed, DeclinedPublic

Description

We probably have some old tasks related to this, but I wanted to use this task as an ideas dumping ground separate from the parent task, while I review research papers and ideas come up.

The value of performance stability versus average performance

We've floated the idea a few times in the past that we could study performance perception in the real world by making the site slower on purpose for a group of users and studying the effect on their behavior. I think a refinement on that idea would be both to introduce a high random variance in performance, as well as bad, but consistent performance.

It's possible that a website that runs fast 99% of the time and has a very slow page load every now and then is more frustrating to the user than one that is always average in its performance, absolutely consistently.

In the context of what our team is doing, if one phenomenon is a lot worse than the other, this would dramatically shift our focus. If consistency is the most important thing, tackling high percentiles should be our main focus. If faster response is the most important factor, it confirms that our current approach of focusing on making things faster across the board is the right one.

This could be studied in a controlled environment or "in the wild" by intentionally slowing down page loads. The challenge lies in how we measure that users are more satisfied with one scenario than another. By asking them? By measuring session length?

This might work as an opt-in study, with a browser plugin that either doesn't affect load time, inserts randomly slow pageloads, or slows down all pageloads if necessary so that page load time is made very consistent. Then measuring time spent on wikis over a long period of time.

How results might be actionable: if we find that stability is more important than average performance, it might encourage us focus on improving high percentiles and extreme cases more than performance across the board. If average performance matters more, this would reinforce our current focus.

Performance perception thresholds and granularity

In a controlled environment, it would be interesting to identify the latency threshold for the core mechanics we want to study (eg. reading, editing).

There is a limit to what humans can perceive as being instantaneous, and it seems to depend on context (since studies suggest that audio and haptic latency thresholds are different). It might also depend on age and background, again with studies suggesting that younger people have lower latency thresholds.

This would inform us on what the limit is, beyond which optimizing is pointless, as people can't tell the difference. We could also use this study to study how satisfied people are at different thresholds when they start perceiving latency. Taking a pessimistic example, if moving the needle from say 100ms to 30ms response time increases user satisfaction only from 80 to 82%, it might not justify some the budget allocated to a project aiming to achieve such performance improvements.

Studying this would require a lab setup and big enough cohort of participants.

How results might be actionable: knowing what users consider to be an "instant wiki pageload" would inform us on the point beyond which further optimization is futile. Furthermore, identifying the granularity for a perceivable performance difference, it would inform decisions to pursue a given optimization, if the expected savings are below the threshold of what users would perceive to be different.

Measure user interaction with JS-enhanced elements

Some of the above ideas assume that visual progress is king. If we run a survey and ask people to tell us how fast the page loaded, they will probable assume that we're talking about visual progress, and not about page interaction. However, on pages where people's main task is to interact with JS-enhanced elements, fast visual progression might not be the bottleneck of frustration, if the interactive elements people are waiting for aren't usable when visual progress is complete. The most obvious example being the visual editor, which is highly interactive and enhanced by JS.

Therefore I think it would be interesting to measure our user interaction rate with JS-enhanced features on all pages, to see if any patterns emerge. This might help us categorize pages where optimizing for faster interactivity is the most important factor for user satisfaction, versus optimizing for faster visual completion.

Related Objects

StatusSubtypeAssignedTask
Resolved Gilles
Declined Gilles
Resolved Gilles
Resolved Whatamidoing-WMF
Resolved Gilles
Declined Gilles
Resolved Gilles
Resolved Gilles
Resolved Gilles
Resolved Gilles
Resolved Gilles
Declined Gilles
Invalid Gilles
Resolved Gilles
Resolved Gilles
DeclinedNone
Resolved Gilles
Resolved Gilles
Resolved Whatamidoing-WMF
ResolvedSlaporte
Declined Gilles
Declined Gilles
Declined Gilles
Resolved Gilles

Event Timeline

Gilles created this task.
238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".
This comment was removed by Krinkle.
Aklapper changed the visibility from "Custom Policy" to "Public (No Login Required)".
Gilles lowered the priority of this task from Low to Lowest.May 27 2019, 5:05 AM

I've removed the gaze idea. Since our early experiments with a gaze-tracking device have shown that this might be a distorted way to look at things. Subjects seemed to be unaware that their eyes locked on things and couldn't correlate this to higher interest. Also, gaze behaviour varied a lot between subjects, which suggests that we wouldn't find a one-size-fits-all solution to prioritisation. Our vision doesn't work in a laser-like tunnel, which is a model these devices end up pushing you towards.

Removed video-based idea, as there are enough studies of that kind, and the task becomes a "game", disconnected from the reality of the pageload's effect on peoples' perception.

Actually, let's close this. We're already going to answer the questions these ideas were trying to answer with some of the already-filed followups to the perception study (variability), and with new APIs (event timing + tracking clicks) for the interaction part.