Page MenuHomePhabricator

[SPIKE] Investigate new approach to article fetching/cache-invalidation logic
Closed, ResolvedPublic

Description

We have discussed two approaches to fetching & invalidating cached articles. Our ideal strategy would:

  • Reduce or eliminate redundant article downloads (e.g. downloading the same revision of the article we have locally)
  • Handle the offline case gracefully (i.e. not show a banner viewing a cached article offline)
  • Try to honor (implied) promises to the user about content "freshness":
    • As a user with a good internet connection, I should never see content that's significantly outdated, so that I can get the latest information and reap the full benefit of Wikipedia's active editor community
    • A a user with a poor or nonexistent internet connection, I should be able to view the last revision of an article that I downloaded, so that I can have as full of an experience as possible while offline

New proposed algorithm

In a nutshell, never download content unless the user asks us to. It's our job to let users know when the content they're viewing is out of date, and empower them to decide (given their current internet connection and/or data restrictions) whether or not to download it.

Spec (Article View)

Scenario: Online w/ warm cache
Background:

Given $article is cached
And my connection is online

And there's a newer revision of $article
When I navigate to it
Then I should see locally stored revision of $article immediately
And I should eventually see a prompt to tap the refresh button to see the latest revision

And there's no newer revision of the article
When I navigate to it
Then I should see locally stored revision of $article immediately
And I should not see any banners

And the app can't retrieve the current revision of $article
When I navigate to it
Then I should see locally stored revision of $article immediately
And I should see a "warning" banner that we failed to determine the current revision of the article

Scenario: Offline w/ warm cache

Given $article is cached
And my connection is offline
When I navigate to it
Then I should see locally stored revision of $article immediately
And I should not see any banners (suppress offline warnings when content is cached)

Open questions

  • How exactly should we indicate in the UI that the revision of the article the user is seeing is out of date and/or prompt them to refresh?
  • If revisions change very frequently with very little change in the article, will this approach be too noisy?
  • Should we indicate the user is browsing the article offline if we fail to fetch the latest revision?

Legacy algorithm

Spec (Article View)

Scenario: Online w/ warm cache
Background:

Given $article is cached
And my internet connection is online

When I navigate to it via (search-result|link)
Then I should see a blank view with a progress bar
And I should eventually see the latest version of $article

When I navigate to it via (*not* search-result|link)
Then I should see the $article content immediately
And I should not see any error banners

Scenario: Offline w/ warm cache
Background:

Given $article is cached
And my internet connection is offline

When I navigate to it
Then I should see the cached content
And I should not see an error banner


This should also address the issue where users see an offline error banner when viewing a cached article:

Steps to repro

  1. Save a featured article
  2. Enable airplane mode
  3. Tap on featured article

Expected results

No banner

Actual results

Offline banner:

Screen Shot 2015-12-14 at 3.08.41 PM.png (1×750 px, 194 KB)

Related Objects

Event Timeline

BGerstle-WMF raised the priority of this task from to Needs Triage.
BGerstle-WMF updated the task description. (Show Details)
BGerstle-WMF subscribed.
BGerstle-WMF lowered the priority of this task from High to Medium.Dec 14 2015, 8:39 PM

@JMinor @Nirzar can you clarify your intended behavior for when we show the offline banner? Specifically: does it matter if the user navigates to a saved page from somewhere other than "saved pages" (e.g. Explore?)

@BGerstle-WMF it doesn't, we can consider it as cached (in reality it's saved) but if the content is there to show, we don't need to ping about the internet.

@Nirzar sure, so to be clear:

Given an article has been downloaded
And my internet connection is offline
When I navigate to it
Then I should see the cached content*
And I shouldn't see a banner

\* There's a larger question of when we should even show cached content, but I hope to bring that up tomorrow. For example, we might show cached content while indicating that we can't fetch newer versions of the article (or refresh).

BGerstle-WMF renamed this task from As a user, I shouldn't see offline banner when navigating to a saved page to [SPIKE] Investigate new approach to article fetching/cache-invalidation logic.Dec 16 2015, 3:32 PM
BGerstle-WMF updated the task description. (Show Details)

New proposed algorithm
In a nutshell, never download content unless the user asks us to. It's our job to let users know when the content they're viewing is out of date, and empower them to decide (given their current internet connection and/or data restrictions) whether or not to download it.

FYI - this was proposed, but I don't recall it every being selected as a solution at the meeting. I believe the spirit of this spike is to find out if we can determine if we have stale content. What we do with that information is still something that needs to be resolved. So I think that is a little bit of putting the cart before the horse.

Additionally, this proposal does not line up with our goals for 5.0 - which were specifically targeted towards Global North. "never download content unless the user asks us to." does not seem like a sensible default for that audience who tend to have good fast internet - I personally can't think of many apps that default to stale content and make the user manually refresh even is a good connection is available.

Looking at @BGerstle-WMF's work on Github, it appears we can get the revision and determine if we have stale content pretty easily.

With that in mind, as a first pass, I would recommend going with tried and true plane jane caching logic:

  1. Load article from cache if we have it
  2. Check if the article is stale
  3. If stale, download and show the fresh content

…and go from there. We might want/need to do further optimizations but we should see how a simple solution works before we do.

FYI - this was proposed, but I don't recall it every being selected as a solution at the meeting.

I thought we had said to spike it, and see what we found. Given what we've found, we can decide what to do. It seems like you're proposing we always download if we determine it's "stale" (i.e. don't prompt the user).

Additionally, this proposal does not line up with our goals for 5.0 - which were specifically targeted towards Global North.

Debatable, IMO, but nonetheless: IIUC this was more about either reverting to legacy functionality or seeing if we could make a quick win for being a better platform citizen (not downloading data if we "don't have to."). I agree a deeper dive of adaptive connection handling is out of scope for 5.0—but I think the work done thus far is a good, small first step which is easy to iterate on.

And, my verbiage in the task desc was about trying to put in words what our longer term goals were for the app's efficiency & gracefulness w.r.t. networking. Mostly as a means to help focus the discussion here and keep concrete trade-offs in mind.

@Fjalapeno w.r.t. not prompting the user, I don't have any strong feelings, now that we can now ensure that we're downloading new content.

In the long term (potentially Q3) we can try using the mobile endpoints in RESTBase which support If-Modified-Since: cache control. This would essentially do the check for us, server side—although w/o checking for minor edits, etc.

This was discussed at today's engineering review meeting, and we came up w/ a new algorithm which has a minimal best-case (warm, updated cache) of 500-800ms while still only fetching content if it's out of date:

  • Send pre-flight request for latest revision w/ short timeout (500ms, maybe 800-1000?)
  • If timeout expires: show cached content w/ warning about failure to fetch latest revision
    • else if current revision is not latest, fetch latest revision w/ moderate (default) timeout
    • else show latest revision