Page MenuHomePhabricator

mobile-html for offline: versioned CSS and JS
Open, MediumPublic

Description

To better support the save for offline use case it's beneficial to version the linked CSS and JS files.

Event Timeline

bearND triaged this task as Medium priority.EditedAug 7 2018, 4:23 AM
bearND created this task.

Should we separate pagelib CSS from base CSS? It might be easier to come up with version strings for the pagelib by itself by using the version of the wikimedia-page-library. And this would also not interfere with the apps bundling the page lib CSS (while they still use the old endpoints).

+1 on that @bearND . Also for readability purposes.

Some questions to understand better what is up:

Should we separate pagelib CSS from base CSS?

Do you mean exposing another CSS endpoint? Something like /data/css/mobile/pagelib?

It might be easier to come up with version strings for the pagelib by itself by using the version of the wikimedia-page-library.

What are the version strings for? Can you expand on how it is done now and how it would be done if we split it off? (and what we would gain from it?)

And this would also not interfere with the apps bundling the page lib CSS (while they still use the old endpoints).

Wouldn't they stop using the bundled page lib CSS once they move from mobile-sections to mobile-html? What would be the use case for keeping them both?

Re: offline caching, we have to keep in mind that the base and site CSS are versionless by definition and on the web platforms these styles are meant to apply to whatever HTML you are looking at (old or new). So maybe that should be the behavior of apps regarding those CSS endpoints, and the pagelib ones could be versioned. More on this below.


Without the context from the previous questions I can say:

  • Splitting the pagelib CSS
    • The CSS from pagelib and the one from mediawiki have different caching characteristics and will be updated at different times
    • The CSS from mediawiki (base and site) is supposed to be applied to the HTML in its latest form (at least on web, that is how it is intended to be used)
    • The CSS from pagelib is tied to the transforms run on the mobile-html endpoint. If the server changes the HTML transforms and the according pagelib CSS, they should be used together in the new version, and the old version should use the old pagelib CSS
    • For the reasons I understand above, it can make sense to have them in separate endpoints
    • Pros:
      • Better caching utilization because probably pagelib will be more seldomly updated than the other apps
      • Could include the pagelib CSS files into mobile-html by version and serve them at the rest layer with a version on the path, so that the HTML could reference them directly, and we could add cache-control: immutable headers.
        • Something like GET /data/css/mobile/pagelib/{version}, the HTML could point to /data/css/mobile/pagelib/1.1.0, etc.
        • We could use the same treatment for the pagelib JS too.
    • Cons:
      • More HTTP requests needed from the web view, could hurt performance
        • But maybe it doesn't matter if the connections are HTTP2

Let me know if I got something wrong or I'm missing something please.

With the above, I don't see strong reasons to not split, thinking that the HTTP requests to the CSS will be in the same domain and will be able to reuse the connection made with HTTP2, so that should make it almost as efficient as having it in the same endpoint together.

I would like to understand the reasoning about why we are doing it though, so I'd appreciate answer to the questions above! 🙂

Do you mean exposing another CSS endpoint? Something like /data/css/mobile/pagelib?

Yes. Exactly that.

What are the version strings for? Can you expand on how it is done now and how it would be done if we split it off? (and what we would gain from it?)

That is for ensuring compatibility of the CSS and JS for pages apps saved locally on the device. The page content has links to the versioned CSS and JS, so it's easy to match what version of the CSS and JS file is required. When apps save pages for offline they translate some references URLs in the page content to local URLs and save the files accordingly.

Wouldn't they stop using the bundled page lib CSS once they move from mobile-sections to mobile-html? What would be the use case for keeping them both?

Yes, they would, but that is taking a while. Currently the base CSS is already used by Android to create the CSS bundle that comes with the app. (iOS hasn't switch over yet, but I think that's planned as well.)

Re: offline caching, we have to keep in mind that the base and site CSS are versionless by definition and on the web platforms these styles are meant to apply to whatever HTML you are looking at (old or new). So maybe that should be the behavior of apps regarding those CSS endpoints, and the pagelib ones could be versioned. More on this below.

That is an interesting point, that maybe requires more discussion with the app devs when we have the next mobile-html sync. I was assuming that we don't always want to latest CSS, keeping in mind that a user could have a page stored many months ago. Not sure if/when/how often apps periodically download updates for saved pages automatically.

@Jhernandez I've created a subtask to separate the pagelib CSS. This will probably happen in three stages:

  1. Add pagelib CSS endpoint to MCS
  2. Expose pagelib CSS in RESTBase
  3. Remove pagelib CSS from base CSS and add request to it from the mobile-html endpoint.

👍 Sounds good. Should we do the JS too in parallel?

@Jhernandez On the JS side there is already a pagelib JS endpoint. In fact, its the only JS endpoint we expose here. Nothing from RL.

Let's discuss here more the versioning aspect.

  1. Do we need versioning?
  1. If so, for which of these endpoints do we need it?
  • data/css/mobile/base
  • data/css/mobile/pagelib
  • data/css/mobile/site
  • data/javascript/mobile/pagelib
  1. Should we be using the npm version for the pagelib ones? E.g. 6.1.2
  1. What should we use for the ones we get from RL? (Should we use the hash of the content or the etag from RL: etag: W/"0uhckya" -> 0uhckya?)
  1. If we need versions, should they be in the path or in a query. E.g. data/css/mobile/pagelib/6.1.2 vs. data/css/mobile/pagelib?version=6.1.2
  1. Yes
  2. All of them would be ideal
  3. It might be better to use a hash of the file contents to be consistent - data/css/mobile/pagelib/hash_goes_here. This would also keep around only one copy of the css if it doesn't change between versions.
  4. Hash of the content
  5. I'd vote for path

I forgot the most important question: Why do we need to version the CSS and JS?
Currently the apps bundle the latest version of the CSS and JS with the app, so there is only one version available. Why do we need to change this?

Currently the apps bundle the latest version of the CSS and JS with the app, so there is only one version available. Why do we need to change this?

For offline use, the iOS app currently saves the section HTML without any transforms applied. This way, the content is decoupled from any pagelib version.

If we save the HTML from /page/mobile-html with some pagelib transforms applied, I thought we'd want to use the same version of pagelib at runtime that transformed the HTML server-side. Extending from there, it seemed reasonable to try get an exact snapshot of the page as it was (including the base and site CSS). I don't think the versions need to stay available forever, the assumption is that we would load the newest mobile-html version if we had connectivity and the older versions that were saved would be served from the cache.

If this is incorrect and we can mix & match pagelib versions with mobile-html versions, similar to how the web works with base & site CSS, then I'd be ok without versioning.

@Mhurd @Dbrant any thoughts?

Thank you for writing down the pros for adding versioning.

A drawback of adding versions to the CSS and JS URLs is that those are stored in RB Cassandra. Whenever we deploy a new version with an updated pagelib we would have to ask the Services team to regenerate the results or hydrate that version number. These options seem expensive on the server side.

I think the issue of the client not knowing which transformations have run server side pertains to the JS portion only. It could be also solved by adding the version number of the pagelib inside the inline script at the end of each mobile-html output. I'm envisioning adding a simple ClientTransforms.run(), which takes the pagelib version of when the mobile-html was generated plus some extra parameters in an options object for the collective transformations needed to be run client-side.

Jhernandez raised the priority of this task from Medium to High.Oct 7 2019, 4:05 PM
LGoto lowered the priority of this task from High to Medium.Oct 30 2019, 3:45 PM

Hey @JoeWalsh, are we set for offline stuff with mobile-html? Can we close this task?

@Jhernandez we are set for now but this will still be necessary when a change occurs that necessitates versioning these files. It could be done proactively now or revisited when a change occurs that necessitates versioning.