Page MenuHomePhabricator

[Spike, 4hrs] Should we abort in flight page previews requests?
Closed, ResolvedPublic

Description

Right now, given our design choice in T70861#3129780 to trigger HTTP requests to show previews within 1second we generate a lot of needless HTTP requests as brought up in this thread:

Sam made a graph to show API utilisation https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?refresh=1m&orgId=1&panelId=15&fullscreen

We should probably abort these requests, in the client...right?

Note: Any optimizations we might do to the timeouts, we need to do without impacting time to preview. The pane https://grafana.wikimedia.org/dashboard/db/reading-web-page-previews?refresh=1m&panelId=6&fullscreen&orgId=1&from=now-2d&to=now
needs to stay around 700ms (see the other panes with TTP percentiles for more details).

Questions to answer

  • Is the delay on page previews working as expected?

Yes, the delay works perfectly right, the AJAX requests to the summary endpoint are fired after 150ms.

  • Are there any circumstances where an API query is occurring when it should not be?

We fire the AJAX call after each time a user dwells on the link. When a user leaves the link and no popup is shown we should abandon the request immediately. When the user leaves the link we wait for ABANDON_END_DELAY (which is currently 300ms) and then reset the state. The problem is that the FETCH_DELAY_START is smaller (only 150ms) - what usually happens, when PagePreviews are in state ABANDON_START (which means user left the link, do not show the popup) we still send the AJAX call, even if a user is already pointing on the other link.

  • Is there any value in aborting requests?

Yes, there is always a value in aborting requests:

  • from the user side - fewer data sent to the user
  • from the service side - there is some value, but it will not make any considerable difference. 99% of summary requests are stored in varnish and never reach restbase. But for remaining 1% aborting request will reduce the server load (first restbase checks Cassandra cache, and if the summary is not present it calls MCS).

Outcomes

As a result of this SPIKE I created the T197700

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 25 2018, 4:50 PM
mobrovac added a subscriber: mobrovac.

Perhaps not abort them, but keep them in the client's cache and not show them?

Jdlrobson added a comment.EditedApr 25 2018, 5:08 PM

@mobrovac we don't show them, that's not a problem but the request still happens so probably a high percentage of hits to the endpoint are not necessary (other than keeping the cache full... which might be a good thing!).

Jdlrobson updated the task description. (Show Details)Apr 26 2018, 3:06 PM

We talked about this in grooming today. The only plus to aborting requests would be the client would not download the summary unnecessarily.

We're a bit concerned about page previews being invoked while the mouse is moving and text is being selected.
@Jhernandez @phuedx do you have any background on this? Should we also be listening for mouseout events to prevent these requests?

Something we should talk about in dev time?

Jdlrobson renamed this task from Should we abort in flight page previews requests? to [Spike, 4hrs] Should we abort in flight page previews requests?.Jun 12 2018, 4:45 PM
Jdlrobson triaged this task as High priority.
Jdlrobson added a project: Spike.
Jdlrobson updated the task description. (Show Details)
Jdlrobson moved this task from Needs Prioritization to Upcoming on the Readers-Web-Backlog board.

it's difficult to measure 150ms. To verify if this logic works properly, we can increase the timeout from 150ms to like 2s and verify the current fetch dealy start on the local instance.

The age is never reset for cached page previews (T184534). For many articles, this means that most clients will always issue a network request for a given preview regardless of whether they've previously requested the preview or not.

pmiazga added a comment.EditedJun 12 2018, 9:24 PM
TL;DR;

ABANDON_END_DELAY has to be lower than FETCH_DELAY_START, otherwise, we trigger AJAX event after every LINK_DWELL event.

Lets check if FETCH_DELAY_START works properly

Let's check if I set the FETCH_DELAY_START to 2500ms everything still works correctly as my assumption is that the AJAX event is fired without delay:

The first couple LINK_DWELL events were just moving a mouse pointer over lin - it properly calls LINK_DWELL, then ABANDON_START, and ABANDON_END.
As you see, the selected FETCH_START happens only after 2.5s

It looks like it works, let's analyze the flow when `FETCH_DELAY_START = 150ms

The strange thing happens when I set the FETCH_DELAY_START is set to 150ms, the FETCH_START event should should happen after 0.15s

The FETCH_START event is called after 0.07s (70ms), 0.09s (90ms), 0.1s (100ms),. 0.14s (140ms), but the previous event is ABANDON_START.

The ABANDON_START doesn't "hold" the request, it waits 300ms (https://github.com/wikimedia/mediawiki-extensions-Popups/blob/master/src/actions.js#L30), and if the mouse pointer didn't re-enter the link it resets the state.
But because the ABANDON_END_DELAY is longer than FETCH_DELAY_START we will always fire an event.
Looks close to the second screenshot,

  • first set of events`ABANDON_START` happened 70ms after LINK_DWELL, and then FETCH_START happened 70ms after ABANDON_START. 70+80 => 150ms
  • lets check something in the middle - ABANDON_START happened 50ms after LINK_DWELL, then FETCH_START started 100ms after ABANDON_START => 50+100 => 150ms
I think we know whats going on, lets set ABANDON_END_DELAY to 150ms, and FETCH_DELAY_START to 150ms and let's see what happens


Almost everytime it fires both - abandon and fetch events.

Last try, lets set ABANDON_END_DELAY to 150ms, and FETCH_DELAY_START to 300ms and let's see what happens


It works as it supposed to do, first it ABANDON_START, then ABANDON_END and no FETCH_START is called. When I'm moving the mouse I do not fire any AJAX requests. Although, if I keep the mouse pointer over the link for more than 150ms and then move the mouse pointer away it will trigger the AJAX call (fetch will start after 300ms, ABANDON_END will come after 300ms). But with that settings, AJAX events are not fired after each LINK_DWELL event.

Possible solutions:

  • code that triggers (timed action) FETCH_START first should check current state, and not fire event when state is in ABANDON_START, instead it should wait
  • ABANDON_END should cancel the timed action of calling AJAX, and we have to introduce ABANDON_ABORT that triggers the FETCH_START event once again
  • leave code as it is, just make ABANDON_END_DELAY < FETCH_START_DELAY / 2
  • if preview is not present, fire ABANDON_END immediately

Another possible solution:

  • Whenever a new link interaction is created, if there's an ongoing request, then cancel it.
Jdlrobson assigned this task to pmiazga.Jun 13 2018, 6:40 PM
Jdlrobson moved this task from To Do to Doing on the Readers-Web-Kanbanana-Board-Old board.

Seems like this should be in "doing". Please correct me if wrong!

Jdlrobson removed pmiazga as the assignee of this task.Jun 18 2018, 5:22 PM
Jdlrobson moved this task from Doing to To Do on the Readers-Web-Kanbanana-Board-Old board.
pmiazga updated the task description. (Show Details)Jun 19 2018, 3:26 PM
pmiazga moved this task from To Do to Ready for Signoff on the Readers-Web-Kanbanana-Board-Old board.

We fire the AJAX call after each time a user dwells on the link.

To clarify this do you mean 150ms after a user dwells on the link? I'm guessing the 150ms is so minimal that it's essentially 100% the time but I want to make sure I completely understand.

Thanks for the clear answers @pmiazga
So it sounds like your proposal is that ABANDON_START should abort any pending API requests? Is that correct?

So it sounds like your proposal is that ABANDON_START should abort any pending API requests? Is that correct?

I'd recommend LINK_DWELL. There's already logic to determine if there's a new interaction being started (when we would want to abort in-flight requests) for that action.

We fire the AJAX call after each time a user dwells on the link.

To clarify this do you mean 150ms after a user dwells on the link? I'm guessing the 150ms is so minimal that it's essentially 100% the time but I want to make sure I completely understand.

yes, the delay is 150ms after the LINK_DWELL - it doesn't matter if any other state is called in the meantime, code always wait 150ms and fires AJAX request.
I did some testing, and when I move the mouse quickly (like I want to move to a link in the middle of the article) usually I dwell on couple links on the way, the time between LINK_DWELL and ABANDON_START for a link that was on the path of mouse pointer is usually less than 100ms.

Thanks for the clear answers @pmiazga
So it sounds like your proposal is that ABANDON_START should abort any pending API requests? Is that correct?

I think that the best approach is to:

  • on new LINK_DWELL reset the state
  • on FETCH_START check if isDwelling = true before firing the request. This can be tricky, because if user dwells on the same link before ABANDON_END_DEALY we do not fire LINK_DWELL again.

Thanks for looking into this. Next step is to estimate T197700

Jdlrobson closed this task as Resolved.Jun 19 2018, 9:31 PM
Vvjjkkii renamed this task from [Spike, 4hrs] Should we abort in flight page previews requests? to o8daaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Jdlrobson as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Mainframe98 renamed this task from o8daaaaaaa to [Spike, 4hrs] Should we abort in flight page previews requests?.Jul 1 2018, 10:08 AM
Mainframe98 closed this task as Resolved.
Mainframe98 assigned this task to Jdlrobson.
Mainframe98 updated the task description. (Show Details)
Mainframe98 added a subscriber: Aklapper.