Page MenuHomePhabricator

Performance review #2 of Hovercards (Popups extension)
Closed, ResolvedPublic

Description

This is the bug to track the requested second performance review before this feature is graduated out of Beta Features.
We would like to seek advice from the performance team with regards to the code before pushing to production.

Details

Reference
bz68861

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:36 AM
bzimport set Reference to bz68861.
bzimport added a subscriber: Unknown Object (MLST).

Ori: btw, I don't know of an explicit request to do this RIGHT NOW, so don't worry yet. I just filed this so we don't lose track.

@Jdlrobson this is something we should probably line up with necessary folks before quarter starts.

@greg do you happen to have perf test 1 handy?

Adding @ovasileva, who is the new Reading Web product manager and will be prioritizing work.

@greg do you happen to have perf test 1 handy?

https://phabricator.wikimedia.org/search/query/8EaCg1daD5tT/#R "hovercards performance" search, second result after this task is T88170 :P

I won't call this a thorough review per se, but it appears to me that if caching of the API call does work for logged-out users, it's still only set to 5 minutes anyway.

If you consider the case of a large wiki and this feature being deployed to all users, including logged-out, this is insufficient caching. Not just because the duration is short, but also because the distribution of hovercard traffic makes it probably quite likely that one would open a hovercard nobody else has opened in the last 5 minutes, creating a lot of cache misses.

This feature is akin to quickly opening a mini-article. As such, it's my opinion that its caching should be as efficient as viewing articles in the context at hand. Which, for logged-out users, should mean 30 day caching and purging when articles get updated. The latter is impractical at the moment and depends on features to come like XKey availability in Varnish.

I would argue that this feature cannot be deployed as a default, or even opt-in to logged-out users until T131503: Convert text cluster to Varnish 4 is resolved. Thankfully it looks like the wait won't be that long, since this task is part of Ops' quarterly goals this quarter.

Thanks for the guidance, @Gilles. I think we're in agreement about solving for the caching problem before deployment. The general deployment isn't targeted for Q2.

One of the Q2 FY 2016-2017 goal subtasks, though, is T123445: Add support for RESTBase endpoint consumption, for use of the summary endpoint, which has a 14 day s-maxage window, observes origin initiated purges, and is already used in app contexts. The thinking here is to get the different web and app experiences using the same endpoint as much as possible, ensuring warm caches, consistently generated content atoms, and less unnecessary origin load.

+ @GWicke, @ovasileva (Reading Web product manager), @bmansurov (Reading Web engineer rotated into tech lead role for Q2 FY 2016-2017)

The RESTBase summary end point will indeed take care of caching. It is updated and purged whenever a part of the summary content is modified. This includes changes to the pageimage, as well as edits to the wikidata description.

Will this still needed @greg ? Not sure what the process for deployments is these days?

We will not reach the magic number of 100 ms for hover (https://www.nngroup.com/articles/response-times-3-important-limits/ ) with the current structure so the hovercards will not feel instantaneously. I think we should have a task for fixing this in the future.

@ovasileva: It should be trivial to switch Page Previews to consume a different API endpoint. I'd prioritise reviewing the RESTBase endpoint and ensuring that it returns exactly what we need.

… ensuring that it returns exactly what we need.

It does.

Yes, we are still planning on switching to it.

I think whatever API backend we use we will have a real hard time reaching the 100 ms goal all over the world. Latency will matter. I'm not saying we shouldn't release it without that but I think that should be our long term goal :)

Will this still needed @greg ? Not sure what the process for deployments is these days?

(sorry for the slow response, I'm catching up on bugmail): Yes, this was agreed to be done before further rollout, I believe.

Gilles raised the priority of this task from Medium to High.Dec 7 2016, 8:40 PM
Gilles lowered the priority of this task from High to Medium.
Gilles set Security to None.

Following up on our Q3 FY 2016-2017 interlock, @ovasileva, @bmansurov: @Gilles noted that the Performance-Team team would like to conduct a second performance review after the RESTbase page summary endpoint change is in place, before we start launching. You should also ensure @BBlack and @elukey from Traffic are in the know, particularly for an eye on the Varnish objects as we start enabling the feature (the current API results in edge caching, so will the new thing, in any case once we turn this on the coverage of pages and thus objects is noteworthy).

You should also ensure @BBlack and @elukey from Traffic are in the know,

Better to ask to @ema, I am in the Analytics team :)

Where can we test the RESTBase API version of hovercards?

It's not merged yet but will be soon. See T156800. We plan to deploy within 2 weeks to beta features: T158221

Where can I find how we reason about the extra delay of 500 ms and 200 ms fade to view as described in https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?panelId=4&fullscreen ? I checked code but do we have task where we describe the full flow and how we reason? I still think we should aim for 100 ms that according to studies and UX gurus is the limit for feeling that the preview is reacting instantaneously:

0.1 second: Limit for users feeling that they are directly manipulating objects in the UI. For example, this is the limit from the time the user selects a column in a table until that column should highlight or otherwise give feedback that it's selected. Ideally, this would also be the response time for sorting the column — if so, users would feel that they are sorting the table. (As opposed to feeling that they are ordering the computer to do the sorting for them.)

But now we are settling for the 1000 ms limit right?

1 second: Limit for users feeling that they are freely navigating the command space without having to unduly wait for the computer. A delay of 0.2–1.0 seconds does mean that users notice the delay and thus feel the computer is "working" on the command, as opposed to having the command be a direct effect of the users' actions. Example: If sorting a table according to the selected column can't be done in 0.1 seconds, it certainly has to be done in 1 second, or users will feel that the UI is sluggish and will lose the sense of "flow" in performing their task. For delays of more than 1 second, indicate to the user that the computer is working on the problem, for example by changing the shape of the cursor.

What does that mean? For the end user that means that the feeling of the user choosing to show the preview is gone instead the user now asks the computer to show the preview.

I understand that it's hard to reach the 100 ms limit how it is built now, so then it super important that we really are under the 1000 ms limit.

If we aim for be under 1000 ms I think we should remove the extra delay added by us, because now we have the risk of making p95/p99 values reaching the magic 1000 ms making the preview feel sluggish. Or is the delays and fading based on other studies that contradict Miller and Nielsen?

@phuedx is our total delay more than 1000s? I thought it was around 750?

@Peter

0.1 second: Limit for users feeling that they are directly manipulating objects in the UI. For example, this is the limit from the time the user selects a column in a table until that column should highlight or otherwise give feedback that it's selected. Ideally, this would also be the response time for sorting the column — if so, users would feel that they are sorting the table. (As opposed to feeling that they are ordering the computer to do the sorting for them.)

This applies to UI elements which are meant to be interacted directly and have one purpose. for example> a button. instant feedback is needed when you press a button. the act of pressing is indicator of 1 single intent from the user. Pagepreviews on the other hand are an optional (and additional) interactive element. We are trying to intercept 1 action with 2 intents.

Here are two intents

  1. just hovering the text with my pointer. not really looking for any popups or anything
  2. I am looking for more information on this link

We want to serve both but not disturb the first one. Accidental UI popups are major point of frustration for some. It's not about performance and immediate feedback it's also about being non intrusive. Hence the artificial delay.

We conducted design research studies on page previews with this delay. Proper design research is a good indicator for a our product process to evaluate usability. Research did not show any frustration around feedback times for a hover.

Accidental UI popups are major point of frustration for some.

We're not proposing no delay, because if you're mousing across things that would create needless requests in the background, but a shorter delay. The existing delay seems to have been picked based on assumptions, not research.

Research did not show

That research doesn't show that different delays have been tested. Maybe that was done elsewhere?

Also the fading in (or the sliding up, as I'm experiencing it on the beta version on enwiki) needlessly delays showing content that's available. The feature has already kicked in, and readability is artificially delayed. That's unnecessary, people should be able to read the content right when it's available.

There is a performance dashboard at https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?from=now-7d&to=now. Wikis with page previews enabled by default for anonymous users are using RESTBase as the backend.

We're not proposing no delay, because if you're mousing across things that would create needless requests in the background, but a shorter delay. The existing delay seems to have been picked based on assumptions, not research.

not entirely assumptions but we had done secondary research based on other similar usecases around the web to match expectations. Im not sure that is documented but I will document it.

but a shorter delay

Let me talk to @phuedx to ensure the actual numbers. because i don't think we had total amount of delay to be higher than 1s

That research doesn't show that different delays have been tested. Maybe that was done elsewhere?

But it did not show the current delay is a problem either. the cost of researching everything is very high. we probe things that surface as problems. similarly, we did not show variations between white popovers and red popovers.

Also the fading in (or the sliding up, as I'm experiencing it on the beta version on enwiki) needlessly delays showing content that's available. The feature has already kicked in, and readability is artificially delayed. That's unnecessary, people should be able to read the content right when it's available.

We already cut this animation time in half with the rewrite. The animation helps ease-into a new UI element. without animation the experience is abrupt. it's set at 250ms if i am not wrong.

We're not proposing no delay, because if you're mousing across things that would create needless requests in the background, but a shorter delay. The existing delay seems to have been picked based on assumptions, not research.

not entirely assumptions but we had done secondary research based on other similar usecases around the web to match expectations. Im not sure that is documented but I will document it.

but a shorter delay

Let me talk to @phuedx to ensure the actual numbers. because i don't think we had total amount of delay to be higher than 1s

The main thing to keep in mind is network latency. These controlled experiments are not accounting for realistic ranges in variance of latency. If the design goal is for the preview to show up after 0.5s of hovering, with 1s as worst case scenario, that means you must start the network request before 0.5s has passed.

Starting immediately when the hover starts is, of course, also undesirable because it would cause a lot of wasted requests that would only end up ignored or aborted when the mouse goes elsewhere. Note that I'm not referring to unwanted previews but unwanted requests. These must be treated separately.

I would recommend that a request starts no later than 100ms after the hover first starts. 100ms should be more than enough to avoid the majority of requests that would end up not resulting in a preview. 100ms is also the default value for the jquery.hoverIntent plugin which we've used at different points in time at Wikimedia. Sometimes we drive it up to 200ms based what we find works best for the individual feature, but never more than 200ms as default debounce.

If a client has a fast connection or otherwise ends up receiving the backend response before the 0.5s mark, you can always decide at that point to impose an artificial delay to make sure you don't show the user a preview before you're certain they're interested in this link. But perform the delay after the data has arrived, not before the data fetch starts. Otherwise rendering at 1s will be close to the average experience, instead of the worst.

From https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews

  • Current: p50: 700ms, p75: 750, p95: 1s, p99: 3s.

Removing 400ms from this (0.5s - 100ms), gives:

  • Would be: p50: 500ms (300+200 post-wait), p75: 500ms (350+150 post-wait), p95: 600ms, p99: 2.5s

That's a big improvement already.

In addition, I would also recommend deciding on the maximum delay. E.g. a point at which you believe that showing a preview still would be unexpected to the user. Normally when loading things on the web a result is never unexpected, but given that the user didn't click something and no placeholder or indicator exists, I believe there should be an upper limit after which the preview should not be shown no matter when it arrived. For example: 10s.

Another piece of logic commonly found in this kind of feature is a smart debounce. For example: If within the course of a page session the code finds that more 50% of at least 3 previews resulted in the 10s limit being reached, it may be better to disable the feature going forward within that page view to avoid continuing to make background requests for every hover for a user that is never going to see the preview and doesn't even know that it can exist.

I take issue with the idea of delaying display of data once it's available, through animation or otherwise. When you load an article page, we don't make the article bounce into place for aesthetics. Unless proven otherwise artificial delays and animations have no justification. You find that research cost is too high? Fine, reduce the artificial delays and remove the animation without research. That's what we're requesting, as "guardians" of site performance. The burden of proving that extra delays and animations have a measurable benefit is on you. It's not for us to prove the opposite, because there's extensive public research about the fact that hundreds of milliseconds in content visibility matter a great deal in terms of user engagement. It's always possible that a specific feature might perform better with an artificial delay, but that goes against what is known about performance, and this requires to be demonstrated, not based on a designer's hunch.

I remember someone (Toby I believe?) in a meeting stating that this extension had proven to increase user engagement through a measurable metric (session length, perhaps?). That means there's already a metric that's been used to measure the effectiveness of the feature as it is. Surely, that can be extended to an A/B test of a show-content-ASAP version. No expensive 1-on-1 research, complicated research methodology design, just an A/B test flag and if there's a significant engagement difference it should be visible.

@Gilles Few things. Please, let's not use design principles out of context.

When you load an article page, we don't make the article bounce into place for aesthetics

This is an entirely different experience and use case than Page Previews.

That's what we're requesting, as "guardians" of site performance

Yes, as "guardians" of user experience I care about perceived performance. It is a major part of UX. I just want to point out that it's a thought through thing and not something that's been overlooked. Let's discuss it instead trying to overriding each other by functions :)

The burden of proving that extra delays and animations have a measurable benefit is on you.

That is true. but we have to be resourceful about what we prove and what we don't. Perceived performance is more difficult to measure than site performance. but it looks like, we have come to a point where we "have" to prove it.

because there's extensive public research about the fact that hundreds of milliseconds in content visibility matter a great deal in terms of user engagement.

Again, this is taking research out of context. When they talk about content visibility, they talk about first paint of primary intent. In our case, primary intent would be loading an article. Page Previews are a secondary and an optional way of engaging.
If you use OS level tooltips which are an "optional" feature to know more about something, every OS has an artificial delay to show such tooltip. there are not even any server calls for this but you wait till the tooltip shows up.

  1. because there is a primary function of the object aka a button or a link or a menu.
  2. as you are moving your mouse you don't want bunch of tooltips just popping up

pasted_file (126×414 px, 20 KB)
< OS tooltips have artificial delays ranging from 0.9s to 2s

Twitter
https://youtu.be/0I2JsP9qaes
Twitter has an average time-to-display at around ~700ms on a fast connection with artificial delay. In-fact our micro transition and delay times almost match twitter's by coincidence.

Facebook
https://youtu.be/4zybemG-Hj4
Facebook has an average time-to-display at around 1.5s on a fast connection with artificial delay.

Please examine it using dev tools to verify these numbers. Also, if you think Facebook and twitter didn't do extensive research on finding these values that work for them then you are mistaken. I personally have seen very extensive tests around everything related to perceived performance at twitter. I'm not giving random examples of big companies and saying we should do what they are doing but just illustrating that "artificial delay" is not bad user experience. Popovers on facebook and twitter are the closest relevant examples of Page Previews.

More ever if you are still concern about this I can set up an evaluative usertest for the following with 7-10 participants. 7-10 is a standard number of participants for usability evaluation.

Example test protocol

  1. Let users use page previews
  2. Test _with_ the existing consistent delay
  3. Ask questions around user expectation around the time it took for page preview to show
  4. surface any issues related to perceived performance. viz. "As a user, I expected it to show faster, or I expected Preview to show as soon as I hovered over it"

Why not comparative test between two different delays?
We can't expect users to to compare Xms with Yms in a single session and know the difference right away. Usually people can't quantify and articulate the nuanced differences, right away. What we can do though, is show existing page previews and ask about their experience around time and delays. obviously questions will be framed in a proper way.

If you are not happy with the "qualitative tests" we can also set up quantitative A/B test to evaluate behavior related to abandoning page previews. more accidental page previews = more abandoned page previews.

Just want to call out, things are there for a reason, do not mistaken it for "fancy" animations or trends. Accidental page previews could lead to a major reason for not liking page previews entirely. An abrupt experience is a bad experience.

Even if we shave off 100ms from this I would still like to avoid accidental page previews at any cost. the artificial delay should act as a buffer for faster and slower connection keeping the time-to-display constant, consistent, and not abrupt. Those 3 are design requirements here.

I'm sorry to say, but what operating systems do, what other websites do, is completely irrelevant. You put way too much trust on them doing the right thing and having verified it, without linking to any comprehensive study of a feature exactly like this one. Due diligence is very inconsistent for UX design on big web properties and people make mistakes.

You've precisely dismissed arguments of my own saying that they couldn't be comparable because they weren't the same thing. But now, what you've seen other popups do in completely different contexts becomes gospel...

If engagement can be measured (I was under the impression that it had been, as justification to deploy this feature), we have the opportunity to A/B test and settle this matter very easily. I'm not convinced by the usefulness of a user test on such a small sample, it's very difficult to set this up correctly. It's very easy to get wrong. And by asking questions about timing, you're bringing attention to something people shouldn't be paying attention to, completely skewing the results. This is the sort of feature where people don't have to do any thinking about it. If you put their nose in its mechanics, they won't think about it intuitively.

Whereas if you are already measuring the success of the feature in terms of engagement with real user metrics, as justification for the very deployment of the feature, the exact same metric can be used to pit variations of the feature against each other.

I disagree that "abandoned page preview" is a metric that matters. You're too caught up in the feature itself. And it doesn't really prove anything. If previews feel faster to show up, people might glance around more before picking something they'll go to. Versus navigating more slowly and over less previews because they unconsciously feel sluggish, causing people to glance at less previews. A/B testing something too narrow like that makes you miss the user intent and the greater context.

What matters is overall engagement on the site. Unless I missed something, it's the entire justification for this feature, it's what should be measured.

Look at this this way: it's the best opportunity to prove that you're right and that your assumptions about the various delays are justified, with undeniable metrics to back it up. It's great study material to publish. All I'm saying is that we can't have any artificial delays that haven't been proven to add benefits, because there's no proof out there that supports this argument yet. You just have best guesses, and for something that will make it very prominently to the front page of Wikipedia, that's not enough. Especially when you consider that A/B testing different delay values is a really trivial thing to do.

All I'm saying is that we can't have any artificial delays that haven't been proven to add benefits, because there's no proof out there that supports this argument yet.

Proof from very reputed usability research firm Nielsen and Norman [1]

For hover interactions, the best cue for determining user intent is that the user’s mouse actually stops on the element that will trigger the event. The longer the stop, the stronger the indication of intent. If the mouse cursor is still in motion, intent to activate hidden content cannot be assumed because the mouseover could be part of the cursor’s path to another element, or merely a motion made while consuming content.

[2]

Then, the user’s intent to expose any corresponding hidden content can be assumed either immediately upon click or tap, or after the mouse cursor paused movement and remained in the target area for around 0.3–0.5 seconds

AKA Artificial delay

Now let's talk suggested numbers by them >

  1. Show visual cue about interactivity within 0.1s
  2. Wait 0.3 to 0.5 seconds to identify intent [2]
  3. Use 0.1 to show the content after the waiting [2]

in our case before showing the hidden content, the only visual cue we have is underline to the hyperlink. it's a feedback that shows within 0.1s of hovering over the url.

Here's addition of these 3 things
0.1 + 0.5 + 0.1 = 0.7s as total delay before we show the popover.

Why 0.5 and not 0.3?
It depends on what kind of hidden content are we talking about. are we talking about subtle replacement of existing content (examples in their research) or are we talking about adding another considerably large object on top (page previews). bigger the change in the UI, longer the delay.

also remember,

The longer the stop, the stronger the indication of intent

Designing this feature, we really need a strong intent to trigger a considerable UI change.
The point is 0.3 to 0.5 is considered Safe. As a designer I can choose 0.5 and be in the safe zone.

This is my professional judgement based on my experience and this research. Feel free to refute all of it.

If you think this is not good enough of proof, we can lower the entire delay to 0.6s.

I don't mind doing A/B tests to prove this but reading your thoughts on what metric we care about, I think we are going to have another long conversation about what success looks like in A/B test.

[1] https://en.wikipedia.org/wiki/Jakob_Nielsen_(usability_consultant)
[2] https://www.nngroup.com/articles/timing-exposing-content/
[3] https://www.nngroup.com/articles/too-fast-ux/

P.S. I do take their research as gospel

What you're quoting are essays, not research. As reputable as N & N might be in the design world, the articles you're linking to are still opinion pieces and the numbers they provide are pulled out of thin air, with no indication as to how they ended up with those values. A best guess from someone famous is still a best guess.

You say that there's going to be a long conversation about what constitutes the success criteria. But I was under the impression that the success criteria for the extension itself as a whole had been established, with metrics, as justification to deploy it in production. Is that not the case?

@JKatzWMF gave me a detailed response about the success criteria:

The primary engagement metric for success we were using was this:

pageviews without hovercards < pageviews with hovercards + hovers > ~1 second

Obviously we are applying human intuition on top of this and valuing a pageview at higher than a hover, so that if 100 pageviews were replaced by 100 hovers and 1 pageview, we would not roll it out. We also look at pageviews that result from hitting the back button to go back to a page as less valuable (hovercards reduce these significantly). More details here: https://www.mediawiki.org/wiki/Beta_Features/Hovercards#Success_Metrics_and_Feature_Evaluation. Quantitative specifics here: https://www.mediawiki.org/wiki/Beta_Features/Hovercards/2016_A/B_Tests#Hovercards_A.2FB_Test_Results

We are also look at things like page depth and time spent on page. Ultimately, visit frequency and time spent are the overall measures we care about and hopefully we can see an impact on those with hovercards...but it takes a fairly radical impact to touch these higher-level metrics.

If I'm reading these documents correctly, the quantitative results didn't show a dramatic change in user behaviour outside of the context of the page previews themselves (i.e. longer sessions). The feature was promoted for further deployment on the basis of it getting a decent amount of use, and not getting disabled much.

Which means that subtle changes to the delay mechanism if we tweak delays in an A/B test are unlikely to be visible in the general user behaviour that had been measured previously, like session depth.

Link interaction seems like a viable candidate in the quantitative metrics that changed during that study. It's not ideal, but it might do the job. Essentially, the hypothesis being that if we make the page previews appear faster by reducing the different artificial delays, it might increase the amount of deliberate link interactions (>1s, >300ms).

@Gilles I know we're going to chat about this next week, but in the interest of time, I'd like to separate out what I see as 2 separate issues.

  1. technical performance - it seems like there is a green light here from your team. Please correct me if I am wrong.
  2. the user experience of an artificial delay - This is what we will discuss more next week. There is concern from you and others on your team that the artificial delay is not providing an optimal user experience and we want to discuss what measures we might take to test your hypothesis.

I'd like to separate blockers from potential improvements. Am I correct in suggesting that hovercards is not blocked by technical performance concerns and that we have your team's approval to release the feature, with acknowledgement that you have relevant concerns that we will likely want to explore?

We don't make that distinction. The time it takes for something to come up is performance. It's a blocker for widespread launch that something is artificially slow without the added delays having been proven to be absolutely necessary.

If you want to get this fast tracked before the research is done, we can provide a way forward with minimal debounce value, no animation, slight preloading if request cancellation is efficient, etc. We can essentially lay out a detailed plan of how this feature can be set up to have page previews come up as fast as possible, as soon as the debounce duration needed to establish the intent to bring up the hovercard is reached. That's the only delay we're comfortable with, the one that distinguishes the cursor moving across a link and the cursor staying on a link with the intent to get more information. But even the current value for that is debatable.

Artificial delays might need to be reintroduced if research demonstrates that @Nirzar's points are valid. But until that's the case, our position is to default with common sense, performance-wise, which is that reducing time-to-content as much as possible is necessary. Just like improving caching was also a necessity previously.

Right now what you're asking us is unacceptable, we can't treat that as a non-blocker, the feature is literally slowed down on purpose in 2 ways, with no proof that it has added value. This goes directly against performance. Until proven otherwise, as soon as the user intent is established, the best performance for a feature is for the content to come up as fast as possible.

If we consider that the mental threshold for when the wait for the response starts is when the debounce value is reached (i.e. when the user has made up her mind about wanting to know more about that link), regardless of that value (500ms currently, I believe), we would need to bring up the page preview in less than 100ms for it to feel instantaneous: https://developers.google.com/web/fundamentals/performance/rail

Currently, we wait 500ms artificially no matter what after the user intent is confirmed, on top of which there is an animation of 200ms. Meaning it takes 700ms in the best case scenario for the content to appear, once the user intent is established. Which means that users experience a delay they normally associate with a significant task like browsing to another page. Which is shocking to us when you consider that the average API response time is a bit less than 100ms, which would put us in the "feels instantaneous" area if leveraged properly... In fact what was the point of improving the API performance (previous blocker), if now we're going to sit and wait if that happens to be fast? This situation is nonsensical to us.

It seems to me like the whole point of page previews is to get content faster than it would have taken to click and go to that other page, isn't it? Right now with that artificial time floor, for the average user it takes the same amount of time to reach firstPaint when clicking to load the actual page...

I perfectly understand @Nirzar's rationale that led to this (I have a degree in the HCI/UX field), but I completely object to its outcome in this case, because as it stands it's based entirely on subjective essays. It's irrational to fight in the defense of these values since they came out of best guesses. Until the counter-intuitive phenomenon claimed to exist for popups is proven to be true, I can only treat it as the current misguided best practices of the UX community in regards to popups. Which I doubt is a feature whose perceived performance has ever been properly researched, so designers tend to defer to what already exists in other contexts, or what someone with a reputation says on the subject. And past mistakes keep being made.

Anyway, the obvious thing to do would be to show the results of the API call as soon as they're available. In the average case, that means we would be around 100ms after the user intent was confirmed. That would be a tremendous performance improvement, with the only cost being to remove a few lines of code.

We can push performance even further if a lower debounce value is enough to distinguish the cursor passing through and the cursor staying on the link, before user intent is confirmed. I.e. if you consider that user intent is met at 300ms, but you can already distinguish the cursor just passing through at 200ms (the default value of the jquery debounce library used, I believe), that means you can start the API call at the 200ms mark. Those values can be refined, but you get the idea, we can make a good guess that the user is going to want the page preview a few dozens of milliseconds before they've really made up their mind. Without wasting bandwidth either loading every link the cursor passes through. Which means that by the time we cross the intent threshold (300ms in this case), the API call is finished on average, which puts us in the "feels instantaneous" area for an even wider spectrum of users, whose network is slower than average.

Hi, I have been trying to get up to speed on this before our meeting today. Reading through the discussion above, I have a few questions and remarks, some of which might be better to record in writing here.

(I understand part of the discussion is about whether the Performance team's role is to ensure that the intended user experience is delivered with minimum delay, or goes beyond that, but that we are moving that conversation elsewhere.)

One can't help observing that there seems to be some diversity of opinion within the Performance team about the reliability of Nielsen, in particular comparing T70861#2820251 (citing numbers from a 1993 publication as authoritative) and T70861#3204904 (basically dismissing two other much more recent articles from Nielsen Group entirely as "best guess from someone famous"). As for "no indication as to how they ended up with those values", I agree that it would be good to know that, but then again in T70861#3227302 you cite yet another publication (https://developers.google.com/web/fundamentals/performance/rail ) as authoritative which, unless I overlooked something, does not give any indication either where those values come from. So we have four rather similar sources (none of them academic peer-reviewed literature BTW) that are treated rather differently. I sympathize with the notion that there is always some value in testing and verifying general industry guidelines in your own use case, but haven't seen a good argument yet why are we supposed to adhere strictly to two of these four sources and completely disregard two others of them.

Link interaction seems like a viable candidate in the quantitative metrics that changed during that study. It's not ideal, but it might do the job. Essentially, the hypothesis being that if we make the page previews appear faster by reducing the different artificial delays, it might increase the amount of deliberate link interactions (>1s, >300ms).

We will likely talk more in the meeting about the upsides and downsides of embarking on such an experiment in general, but to write up some comments on the idea per se (also for the benefits of others not in the meeting):

This seems a reasonably hypothesis at first, but there are some additional things to take into account.

Considering that the Performance team probably has much more experience in measuring the impact of such speed increases on user behavior, could you share some results of this kind from your own research, so that we get a sense about what kind of effect size to expect? Concretely, in the more standard (and presumably already better understood) situation of full Wikipedia pages, do we know how much page view numbers (as the analogue of the deliberate link interactions metric) increase when the median page load time decreases by 100ms, say?

(I'm also asking out of related interest because I try to identify and report the effect of software change on our general traffic data, as part of my work as analyst in Reading. I happen to recall that Ori was very interested in that kind of question - e.g. regarding the HHVM rollout, a large performance project whose positive receptions IIRC became big argument in favor of creating a dedicated Performance team, the effect was partly assessed in
https://meta.wikimedia.org/wiki/Research:HHVM_newcomer_engagement_experiment . But I am not very familiar with current approaches to this kind of question.)

While the positive effects of showing the content faster may in many aspects be comparable with the general pageviews case, the hovercards situation is more complicated because we also need to take negative effects into account - in particular the increase of unwanted cards shown, or the amount of users who don't want to view cards and just move their mouse around as usual. We can't assume that the number of deliberate link interactions reflects this fully (a user who intentionally avoids all cards will have zero of them in either case). So the experiment as proposed would be insufficient to address this concern.

Gilles claimed this task.

FYI we usually link to the RAIL guidelines because they're easy to understand, but they're based on research that's been around for some time about what feels instantaneous, etc. It's less digestible to link to research papers, but I'll do some homework and dig up the actual research on the subject. I personally judge RAIL as informative only, not rules to follow. In most things our team deals with, for example time-to-content on pageload, lower is always better, so the quest to improve never stops and things like RAIL are irrelevant. It's only a useful thermometer when things are so slow they don't feel instantaneous anymore. But again that's more a talking point than a goal or a rule. The definition of what's instantaneous does vary between sources, and it evolves with time. People are more impatient with their devices now than they were a decade ago. Always healthy to review research on the subject, though, so I'll do it since we haven't looked at that for some time.

We've never done research on the benefits of lowering the pageload time, because you can never do that in isolation. In fact, establishing that we did improve pageload is a difficult exercise of its own, given that so many moving parts happen (wikipedia content changes, code deployed every week, the internet as a whole has connectivity ups and downs, our traffic varies quite a bit from day to day, etc). And we tend to improve it in tiny increments. What we can do, and have discussed doing, is worsening it on purpose and seeing the effect in a controlled study. But it's low priority for us, because the research by Google and Amazon on the subject is quite compelling (about extra traffic and user engagement when you make pageload faster).

The meeting brought to light that I spent most of the time on this task discussing a timing profile that was wrong or very outdated about the feature. A big misunderstanding, really, as I thought the debounce and the API/wait were additional, not overlapped and that the debounce was twice as high as it actually is. It's my fault for not looking at the current state of the code for myself before jumping into the discussion, sorry about that.

The current version, where the debounce delay, which is actually 150ms at the moment, is included in the minimum time of 500ms makes the timing strategy acceptable. Especially if we consider that the user intent isn't necessarily formed at the 0 mark. So if we consider intent-to-content as the duration to pit against RAIL or other definitions of what feels instantaneous, it's actually the "worst case scenario" that they already know for certain that they want the preview by the time they bring the cursor to it.

We won't argue about the animation because that's still the subject of unresolved debate in the performance community, with conflicting research findings. What matters is the time from action to feedback that the action worked. The 500ms might still be worth A/B testing at some later point, and as Tilman said during the meeting, it might be useful to wait until people are used to the feature and have accepted it on their wiki for some time before possibly looking at that.

And for the record, our team always doubts 3rd-party information, we even doubt our own metrics constantly and end up finding issues with them. So it might look strange when we come into tasks like this with our big doubt firehose aimed at everything, but rest assured that we're even more paranoid that we're doing something wrong with our own work or that we trust unreliable sources of information. Hopefully some of this discussion was constructive and perhaps useful despite the long misunderstanding.

Thanks @Gilles! Speaking for myself, I also found yesterday's meeting really useful to better understand your perspective, and of course you were raising a lot of valid points - I agree that it's valuable to always remain sceptical about both 3rd party information and one's own data.

...

We've never done research on the benefits of lowering the pageload time, because you can never do that in isolation. In fact, establishing that we did improve pageload is a difficult exercise of its own, given that so many moving parts happen (wikipedia content changes, code deployed every week, the internet as a whole has connectivity ups and downs, our traffic varies quite a bit from day to day, etc). And we tend to improve it in tiny increments. What we can do, and have discussed doing, is worsening it on purpose and seeing the effect in a controlled study. But it's low priority for us, because the research by Google and Amazon on the subject is quite compelling (about extra traffic and user engagement when you make pageload faster).

Right, I know about the challenges of determining effects in the absence of controlled experiments or A/B tests - it's something we encounter quite frequently in Reading too. Regarding performance, IIRC some people actually tried to find effects of the 2015 asynchronous JavaScript change, but with limited methodology - there may be some more sophisticated analysis options we haven't fully explored yet (e.g. Analytics Engineering recently installed the Prophet library for time series analysis on SWAP, which may enable us to get a better grip on daily/weekly seasonality and thus improve our ability to separate out the impact of changes ).