We probably have some old tasks related to this, but I wanted to use this task as an ideas dumping ground separate from the parent task, while I review research papers and ideas come up.
The value of performance stability versus average performance
We've floated the idea a few times in the past that we could study performance perception in the real world by making the site slower on purpose for a group of users and studying the effect on their behavior. I think a refinement on that idea would be both to introduce a high random variance in performance, as well as bad, but consistent performance.
It's possible that a website that runs fast 99% of the time and has a very slow page load every now and then is more frustrating to the user than one that is always average in its performance, absolutely consistently.
In the context of what our team is doing, if one phenomenon is a lot worse than the other, this would dramatically shift our focus. If consistency is the most important thing, tackling high percentiles should be our main focus. If faster response is the most important factor, it confirms that our current approach of focusing on making things faster across the board is the right one.
This could be studied in a controlled environment or "in the wild" by intentionally slowing down page loads. The challenge lies in how we measure that users are more satisfied with one scenario than another. By asking them? By measuring session length?
This might work as an opt-in study, with a browser plugin that either doesn't affect load time, inserts randomly slow pageloads, or slows down all pageloads if necessary so that page load time is made very consistent. Then measuring time spent on wikis over a long period of time.
How results might be actionable: if we find that stability is more important than average performance, it might encourage us focus on improving high percentiles and extreme cases more than performance across the board. If average performance matters more, this would reinforce our current focus.
Performance perception thresholds and granularity
In a controlled environment, it would be interesting to identify the latency threshold for the core mechanics we want to study (eg. reading, editing).
There is a limit to what humans can perceive as being instantaneous, and it seems to depend on context (since studies suggest that audio and haptic latency thresholds are different). It might also depend on age and background, again with studies suggesting that younger people have lower latency thresholds.
This would inform us on what the limit is, beyond which optimizing is pointless, as people can't tell the difference. We could also use this study to study how satisfied people are at different thresholds when they start perceiving latency. Taking a pessimistic example, if moving the needle from say 100ms to 30ms response time increases user satisfaction only from 80 to 82%, it might not justify some the budget allocated to a project aiming to achieve such performance improvements.
Studying this would require a lab setup and big enough cohort of participants.
How results might be actionable: knowing what users consider to be an "instant wiki pageload" would inform us on the point beyond which further optimization is futile. Furthermore, identifying the granularity for a perceivable performance difference, it would inform decisions to pursue a given optimization, if the expected savings are below the threshold of what users would perceive to be different.
Measure user interaction with JS-enhanced elements
Some of the above ideas assume that visual progress is king. If we run a survey and ask people to tell us how fast the page loaded, they will probable assume that we're talking about visual progress, and not about page interaction. However, on pages where people's main task is to interact with JS-enhanced elements, fast visual progression might not be the bottleneck of frustration, if the interactive elements people are waiting for aren't usable when visual progress is complete. The most obvious example being the visual editor, which is highly interactive and enhanced by JS.
Therefore I think it would be interesting to measure our user interaction rate with JS-enhanced features on all pages, to see if any patterns emerge. This might help us categorize pages where optimizing for faster interactivity is the most important factor for user satisfaction, versus optimizing for faster visual completion.