Page MenuHomePhabricator

Minimise incidental HTTP requests caused by Page Previews
Closed, ResolvedPublic3 Story Points

Description

Since the delay to start fetching a preview from the API and the RESTBase API requires resource revalidation at the edge – s-max-age is non-zero while max-age: 0 – so that the UA doesn't hold stale resources in its private cache(s). Consequently, there are a lot of HTTP requests made while casually moving your mouse around the page.

Steps

1. Increase the API request delay

The median API response time is currently 115 ms. Since the API response is artificially delayed so that the preview starts fading in at 500 ms, we can increase the API request delay to 150 ms without affecting the current UX (see also @Krinkle's remarks in T70861#3129780).

2. Review the Cache-Control header
... with @GWicke with a view to increasing the max-age part.

3. Cache API responses in memory for the duration of the page session

While #1 will definitely decrease the number of incidental HTTP requests, it seems unlikely that there's great value in a preview for a link changing between interactions during the same page session, e.g. the user dwells on link A, sees preview version 1, abandons link A, waits 1 minute, dwells on link A, sees preview version 2. Since we don't indicate how recently the linked page was updated, seeing a different preview may be confusing.

Depending on session length data from the ReadingDepth schema, we may want to use an LFU or LRU replacement strategy.

This requires input from both Product and Design.

Details

Related Gerrit Patches:

Event Timeline

phuedx created this task.Mar 24 2017, 5:31 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 24 2017, 5:31 AM

Change 344574 had a related patch set uploaded (by Phuedx):
[mediawiki/extensions/Popups@master] actions: Increase API request delay to 200 ms

https://gerrit.wikimedia.org/r/344574

phuedx updated the task description. (Show Details)Mar 24 2017, 11:53 AM
phuedx updated the task description. (Show Details)
ovasileva triaged this task as High priority.Mar 27 2017, 8:00 PM
phuedx updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.EditedMar 29 2017, 11:01 PM

we may want to use an LFU or LRU replacement strategy.
while max-age: 0 – so that the UA doesn't hold stale resources in its private cache(s).

Holding on to them in JS memory would work, but adds complexity and demands more of the device in terms of memory. We don't need it to be in memory per se. On disk is good enough. (Local disk hit is still much better than a network roundtrip, there are exceptions to that, but that's out of scope here.)

Letting the browser cache it natively means it'll store it on disk (probably async), and also keeps in memory based on its own heuristics. When short on memory, it can't do much about JS, but it can easily do something about its http cache.

Setting max-age to 30 minutes or something like that with private, must-revalidate would do the trick. "Private" prevents proxies from holding on to it. "must-revalidate" ensures it will not use it beyond the initial 30 minutes without checking back with the server first.

Change 344574 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] actions: Increase API request delay to 150 ms

https://gerrit.wikimedia.org/r/344574

@Krinkle: Agreed. The as-yet-unwritten step 1.5 is confirming the value of the Cache-Control header returned by RESTBase with @GWicke and tweaking it as necessary.

phuedx updated the task description. (Show Details)Mar 29 2017, 11:58 PM
phuedx updated the task description. (Show Details)Mar 30 2017, 12:46 AM
phuedx updated the task description. (Show Details)
Jdlrobson added a subscriber: Jdlrobson.

Moved to ready for sign off per https://phabricator.wikimedia.org/T161284#3142468 for verification.

phuedx removed phuedx as the assignee of this task.Mar 31 2017, 7:08 PM
phuedx added a comment.EditedMar 31 2017, 7:29 PM

I'm expecting this number to go down but other metrics to be unaffected when the next MediaWiki train rolls out of the station (Tuesday, 4th – Thursday, 6th). This should be validated as part of sign off.

I think that #3 is mostly out of scope at this stage but can always be revisited as we roll out to more and more of the Wikipedias.

@GWicke: Could you clarify the choice of the s-maxage=$sMaxAge, max-age=0, must-revalidate value for the Cache-Control header.

GWicke added a comment.EditedMar 31 2017, 7:44 PM

@phuedx: That is just the safe default for any resources that are actively purged. For anything used for editing, we need to make sure that clients pick up changes quickly.

For page summaries, a few minutes (120s?) of client side caching should be safe. Vandalism would still be cleared out fairly quickly. Repeat requests are also conditional, and don't transfer the content unless it has actually changed. This makes them fairly cheap. Would a 120s max-age work for you?

Change 345877 had a related patch set uploaded (by GWicke):
[operations/puppet@production] WIP: Add cache-control option that allows for short term client caching

https://gerrit.wikimedia.org/r/345877

Krinkle added a comment.EditedMar 31 2017, 7:59 PM

@GWicke For Navigation Timing we use 3 minutes as the maximum time to capture a page load, which is reached frequently enough (p95 is 6s-50s most days, p99 is 20s-2.5min). Given that this is a feature to be used *after* page load ends, I would recommend 5 to 10 minutes - but no less than 3 minutes at the very least.

5 minutes should liberally cover the majority of page view sessions, as well as a small number of previews on repeat-view (forward-backward), and previews of the same link on multiple pages within the overall user session. I'm sure Reading will have more numbers on the average length of a page view session.

As for conditional requests *after* the max-age has expired, this is mostly irrelevant. The previews are small, and their size-cost insignificant. The main cost here is latency.

GWicke added a comment.EditedMar 31 2017, 8:05 PM

As for conditional requests *after* the max-age has expired, this is mostly irrelevant. The previews are small, and their size-cost insignificant. The main cost here is latency.

With the current popup delay and API response times, follow-up requests should only make a visible difference on very slow (p95+) connections.

In any case, we need to find a max-age that works for all users of the summary end point. 5 minutes should be workable for them as well, but we should double check. @Dbrant @Mhurd, do you see any issues with letting clients cache summary responses for five minutes?

@Tbayer may also be able to provide more information about the current average page session length given that we're actively collecting ReadingDepth data.

phuedx added a comment.Apr 3 2017, 9:08 AM

Also, we used to request a max-age of 5 minutes for the MediaWiki API requests that we superseded by the RESTBase endpoint.

^ Per T161284#3147394, signing off on this task is blocked until Thursday, 6th April.

@Tbayer may also be able to provide more information about the current average page session length given that we're actively collecting ReadingDepth data.

I don't know all the details and context here, but wouldn't something like the median (or p90) time between the first and last API request (for the same link within the same pageview session) be more pertinent here? We can pull that from the Popups schema.

MBinder_WMF set the point value for this task to 3.
phuedx added a comment.EditedApr 7 2017, 9:46 AM

I don't know all the details and context here, but wouldn't something like the median (or p90) time between the first and last API request (for the same link within the same pageview session) be more pertinent here? We can pull that from the Popups schema.

That'd certainly more accurate. As discussed, if you could pull that from the Popups schema, then that'd be great!

Change 348124 had a related patch set uploaded (by Ppchelko):
[mediawiki/services/restbase/deploy@master] Config: Introduce Cache-Control option with some client caching.

https://gerrit.wikimedia.org/r/348124

Created a PR for RESTBase to allow 5 minutes of client-side caching: https://github.com/wikimedia/restbase/pull/795

I think that should be OK for all of the clients.

Change 345877 abandoned by GWicke:
WIP: Add cache-control option that allows for short term client caching

Reason:
Replaced by scap based patch.

https://gerrit.wikimedia.org/r/345877

Change 348124 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Config: Introduce Cache-Control option with some client caching.

https://gerrit.wikimedia.org/r/348124

@mobrovac, @Pchelolo: ^ Will that change be deployed next week (the week of April 24th)?

Mentioned in SAL (#wikimedia-operations) [2017-04-17T15:40:18Z] <mobrovac@tin> Started deploy [restbase/deploy@6595298]: Update client caching headers for T161284

Mentioned in SAL (#wikimedia-operations) [2017-04-17T15:48:33Z] <mobrovac@tin> Finished deploy [restbase/deploy@6595298]: Update client caching headers for T161284 (duration: 08m 15s)

@mobrovac, @Pchelolo: ^ Will that change be deployed next week (the week of April 24th)?

It has been deployed and is live in production now.

Outstanding!

phuedx added a comment.EditedApr 17 2017, 4:40 PM

We can always revisit the max-age value once we're satisfied with the Event Logging instrumentation. For now, it's the same as the value we specify when consuming the MediaWiki API.

@Tbayer: Are you still able to take a look at this? It's low priority given that we're now defaulting to 5 minutes.

Krinkle removed a subscriber: Krinkle.Apr 25 2017, 3:16 AM
Jdlrobson added a project: Services.

Can this be resolved?

phuedx closed this task as Resolved.Apr 27 2017, 5:56 AM

Yes. #3 is made redundant by the change to setting the max-age header to 5 minutes – the UA should cache the response in memory or on disk depending on its own heuristics and won't re-request the resource until the cached response is stale.

mobrovac edited projects, added Services (done); removed Services.Apr 27 2017, 8:58 PM