Page MenuHomePhabricator

Consider increasing s-maxage for /page/mobile-html-offline-resources
Closed, ResolvedPublic

Description

Looking at https://grafana.wikimedia.org/d/000000183/mobileapps?orgId=1&fullscreen&panelId=1 I noticed that /page/mobile-html-offline-resources is our 2nd most requested endpoint behind RESTBase for the mobileapps service.

The output of this endpoint is small but fairly static:

[
  "//meta.wikimedia.org/api/rest_v1/data/css/mobile/base",
  "//meta.wikimedia.org/api/rest_v1/data/css/mobile/pcs",
  "//meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pcs",
  "//en.wikipedia.org/api/rest_v1/data/css/mobile/site",
  "//en.wikipedia.org/api/rest_v1/data/i18n/pcs"
]

We currently have: cache-control: max-age=0, s-maxage=0

curl -sI https://en.wikipedia.org/api/rest_v1/page/mobile-html-offline-resources/Dog | grep \^cache\-
cache-control: max-age=0, s-maxage=0

# mobile-html for comparison:
curl -sI https://en.wikipedia.org/api/rest_v1/page/mobile-html/Dog | grep \^cache\-
cache-control: s-maxage=1209600, max-age=0, must-revalidate

Strawman: set mobile-html-offline-resources cache-control to s-maxage=1209600, max-age=0

Event Timeline

Change 607983 had a related patch set uploaded (by Jgiannelos; owner: Jgiannelos):
[mediawiki/services/mobileapps@master] Add caching on /page/mobile-html-offline-resources

https://gerrit.wikimedia.org/r/607983

I submitted a patch that adds the caching in mobile-html-offline-resources. I have a question though about the following:

> # mobile-html for comparison:
> curl -sI https://en.wikipedia.org/api/rest_v1/page/mobile-html/Dog | grep \^cache\-
> cache-control: s-maxage=1209600, max-age=0, must-revalidate

I grep'ed the codebase for 1209600 but without any results. Where is this caching header applied in the request flow?

I think this is set in the RESTBase deploy repo: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/restbase/deploy/+/master/scap/templates/config.yaml.j2. This one is in Gerrit, although the RESTBase source repo is only in GH.

Look for purged_cache_control. @Pchelolo feel free to correct me if I'm wrong.

Update: I guess we could try using purged_cache_control as is, including must-revalidate . So, in the first phase we just need to find how to apply purged_cache_control for mobile-html-offline-resources as well.
Although I suspect that the requests still would come to mobileapps if must-revalidate is included. Not sure if RB does something to avoid these requests going to the backend service. OTOH I suspect that mobile-html-offline-resources is not stored in the RB DB.

I think at least for semantic correctness we don't want purged_cache_control here, because this response isn't stored in RESTBase and therefore isn't purged.

Instead we could define something like a stable_cache_control value under mobileapps in https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/restbase/deploy/+/master/scap/templates/config.yaml.j2 to be used for rarely updated responses like this. We'd also need to pull it into the service configuration in https://github.com/wikimedia/restbase, specifically by adding response_cache_control: {{options.mobileapps.stable_cache_control}} to the options object associated with path: v1/pcs/mobile-html-offline-resources.yaml in each of projects/v1/wikipedia.wmf.yaml, projects/v1/wikivoyage.wmf.yaml, and projects/v1/wiktionary.wmf.yaml.

That being said... I notice that we are setting cache-control headers (public, max-age=86400, s-maxage=86400) for the PCS CSS & JS endpoints directly in mobileapps, and it looks like RESTBase is simply passing those through.

After giving it some more thought, particularly for a service like mobileapps which serves a variety of different endpoints with different response cachability characteristics, it makes more sense to manage the various cache-control settings within the service itself rather than putting such settings in the API gateway layer provided by RESTBase. Since we seem to have the option of doing either, I think we should merge @Jgiannelos's patch as-is, and if that doesn't give us the result we need, then we can try another approach.

(Side note: I notice that we set up this endpoint to take a title (and revision and tid) parameters despite not having any specific use for them. That's unfortunate, because it means we'll be redundantly caching the same response many times over per wiki. Arguably, that should be fixed before this change is deployed, though unfortunately we already have more than a year's worth of client releases depending on it now.)

Change 608730 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] mobile-html-offline-resources: make 'title' parameter optional

https://gerrit.wikimedia.org/r/c/mediawiki/services/mobileapps/ /608730

Change 607983 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add caching on /page/mobile-html-offline-resources

https://gerrit.wikimedia.org/r/c/mediawiki/services/mobileapps/ /607983

I notice that we set up this endpoint to take a title (and revision and tid) parameters despite not having any specific use for them. That's unfortunate, because it means we'll be redundantly caching the same response many times over per wiki.

I added this comment to Gerrit but copying it here as well:

The only possible complication would be if this response starts needing to vary by title. For example, if we start supporting main pages or talk pages and that requires a different set of files to render properly offline. That wouldn't break existing clients because they aren't requesting those pages from mobile-html, so the parameter could be re-enabled in that case. I can't think of an example where we'd want to support different offline resources by article, but that's the only case that would break existing clients. Is there a way for clients to keep passing the title but have varnish and RESTBase ignore it for now? Also, is this actively still an issue after the maxage was raised or is this an optimization?

Arguably, that should be fixed before this change is deployed, though unfortunately we already have more than a year's worth of client releases depending on it now.

The good news is that there's only a month's worth of client releases that rely on this, two iOS and Android releases. So there's time to fix this now if necessary.

Copy pasting my response from Gerrit for visibility:

Why shouldn't we add logic to RESTBase (or whatever replaces it) to uphold the contract we made with existing clients? The contract was that clients send the page title and the server responds with the files that need to be downloaded to view that page offline. The fact that it currently doesn't vary based on title and doesn't need to be cached separately for every request should be irrelevant to the clients. I remembered the example where we'd need to vary it based on title - lazy loading references. If the main page content doesn't contain the references sections, and the page lazy loads those sections with JS, the endpoint to get the references sections for that title would need to be included in the offline-resources response.

Either way, after thinking about it some more, I think the title parameter should remain and if there's a service-side caching issue it should be solved in the service and not on the clients.

For clarity, the s-maxage cache control directive governs caching specifically in the edge caching layer (i.e., ATS/Varnish), where the cache is either hit or missed before a request hits RESTBase or mobileapps. So our options are:

(1) Have clients continue passing in a title parameter in the request URL path, and have ATS/Varnish cache many redundant copies of the single response given for every URL of the form /api/rest_v1/page/mobile-html-offline-resources/* per wiki. (This is the status quo, or would be except that mobile-html-offline-resources caching is currently being disabled completely because of T256884.)
(2) Have clients continue passing in a title parameter in the request URL path, and (if possible) add a hack to the edge caching code so that ATS/Varnish truncate the request URL for mobile-html-offline-resources requests to /api/rest_v1/page/mobile-html-offline-resources before attempting to serve a cached response.
(3) Have clients stop passing in a title param and simply request /api/rest_v1/page/mobile-html-offline-resources. My mobileapps patch & restbase PR provide for this without requiring it, and continue to support clients already shipped which send values for the title parameter, by simply making the title param optional.

I agree with @MSantos's comment on Gerrit that it's best not to add bits of application-specific special casing to other codebases, because maintainability suffers, so I am not really in favor of (2). My preference is of course for (3) since the response doesn't currently vary by title and we don't have any specific plans to make it do so in the future. But, I could live with (1) if I had to.

Change 608730 abandoned by Mholloway:
mobile-html-offline-resources: make 'title' parameter optional

Reason:
Per in-meeting discussion. The title param may be used in the future for lazy-loaded references.

https://gerrit.wikimedia.org/r/c/mediawiki/services/mobileapps/ /608730

I think unless the cache duplication is causing an issue, we should keep the contract of the API and go with option (1). We have plans to lazy-load content, which would require those pieces of content to exist in the offline-resources endpoint

Just to throw another idea out there before we close out this ticket: any reason not to allow clients to cache this response locally for a while as well? Maybe up to a day (max-age=86400)?

This is now deployed.

mholloway@mholloway:~/code/wikimedia/mediawiki-services/mobileapps$ curl -sI https://en.wikipedia.org/api/rest_v1/page/mobile-html-offline-resources/Dog | grep \^cache\-
cache-control: s-maxage=1209600, max-age=0

Just to throw another idea out there before we close out this ticket: any reason not to allow clients to cache this response locally for a while as well? Maybe up to a day (max-age=86400)?

I think that's a great idea. If we do need to vary by title we can bring it back to 0.

Change 610325 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Mobile-html-offline-resources: Allow client-side caching up to one day

https://gerrit.wikimedia.org/r/610325

Change 610325 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Mobile-html-offline-resources: Allow client-side caching up to one day

https://gerrit.wikimedia.org/r/610325

Mentioned in SAL (#wikimedia-operations) [2020-08-17T15:19:41Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054

Mentioned in SAL (#wikimedia-operations) [2020-08-17T15:22:10Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054 (duration: 02m 30s)

Mentioned in SAL (#wikimedia-operations) [2020-08-17T15:22:25Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out

Mentioned in SAL (#wikimedia-operations) [2020-08-17T15:43:04Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out (duration: 20m 40s)

Mentioned in SAL (#wikimedia-operations) [2020-08-17T15:43:12Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out

Mentioned in SAL (#wikimedia-operations) [2020-08-17T15:44:39Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out (duration: 01m 31s)