Page MenuHomePhabricator

Pages should be cached using the canonical title
Closed, DeclinedPublic1 Story Points

Description

We sometimes cache page titles using a redirect title as a key and other times using the article's canonical title. This leads to multiple, differing copies of an article being cached, with the copy retrieved when loading from cache depending on the title in the link followed. We should be caching once under the canonical title only.

Steps to reproduce

  1. Search and go to (en) article [[Aurora Plastics Corporation]]
  2. Click link to [[Monogram Models]] at the end of the lead paragraph and navigate to the article.
  3. Observe in log: "Writing to cache: Monogram Models"
  4. Refresh the page
  5. Observe in log: "Writing to cache: Monogram (company)"
  6. Close and reopen the app
  7. Observe in log: "Using page from cache: Monogram Models"

Expected results

Page is cached once, using the canonical title to construct the cache key

Actual results

The page cache can in principle contain as many copies as there are redirect titles plus the canonical title. The copies can differ. Which you get depends on how you navigated to the page.

Stack trace

12-08 14:51:38.838 18270-18355/org.wikipedia.dev D/org.wikipedia.page.PageCache$GetPageFromCacheTask: performTask():146: Reading from cache: Monogram Models
12-08 14:51:38.879 18270-18270/org.wikipedia.dev D/org.wikipedia.page.PageDataClient$11: onGetComplete():482: Using page from cache: Monogram Models
...
12-08 14:51:50.346 18270-18323/org.wikipedia.dev D/org.wikipedia.page.PageCache$GetPageFromCacheTask: performTask():146: Reading from cache: Aurora Plastics Corporation
12-08 14:51:50.369 18270-18270/org.wikipedia.dev D/org.wikipedia.page.PageDataClient$11: onGetComplete():482: Using page from cache: Aurora Plastics Corporation
...
12-08 14:53:44.128 18270-18311/org.wikipedia.dev D/org.wikipedia.page.PageCache$AddPageToCacheTask: performTask():84: Writing to cache: Monogram Models
12-08 14:53:50.774 18270-18381/org.wikipedia.dev D/org.wikipedia.page.PageCache$AddPageToCacheTask: performTask():84: Writing to cache: Monogram (company)
...
12-08 14:54:09.166 20509-20620/org.wikipedia.dev D/org.wikipedia.page.PageCache$GetPageFromCacheTask: performTask():146: Reading from cache: Monogram Models
12-08 14:54:09.195 20509-20509/org.wikipedia.dev D/org.wikipedia.page.PageDataClient$11: onGetComplete():482: Using page from cache: Monogram Models

Environments observed

App version: dev
Android OS versions: Nougat
Device model: 6P
Device language: en

Event Timeline

Mholloway created this task.Dec 8 2016, 8:05 PM
Restricted Application added a subscriber: Aklapper. Ā· View Herald TranscriptDec 8 2016, 8:05 PM
Mholloway updated the task description. (Show Details)Dec 8 2016, 8:06 PM
Mholloway updated the task description. (Show Details)Dec 8 2016, 8:42 PM
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)

Change 326033 had a related patch set uploaded (by Mholloway):
Standardize caching on the canonical page title

https://gerrit.wikimedia.org/r/326033

Mholloway renamed this task from Page caching does not correctly handle redirected page titles to Pages should be cached using the canonical title.Dec 8 2016, 9:20 PM
Mholloway claimed this task.

Change 326033 abandoned by Mholloway:
Standardize caching on the canonical page title

Reason:
per above

https://gerrit.wikimedia.org/r/326033

Adding @Dbrant's comments on the patch here for posterity:

There is actually a non-obvious use case for caching pages on their non-canonical title: offline forward navigation.
Suppose you navigate to an article, and click a whole bunch of blue links, some of which are redirects. Then you go offline, and start back at the original article. The user should expect to be able to click those same links, and have those articles load seamlessly (because they're cached). The only way to do this is if we cache the articles based on the non-canonical title, as it appears in the link.
Quick example: Navigate to [[Polar ice cap]], and at the end of the second paragraph, click the [[ice sheets]] link, which will redirect to [[Ice sheet]]. Now go back to [[Polar ice cap]], go offline, and click [[ice sheets]] again.
Anyway, my question would be: would Retrofit caching take care of all these issues on its own? If so, we should probably not mess with our existing caching for now, and just fully switch over to Retrofit caching when ready.