mobile-html-offline-resources endpoint
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	JoeWalsh
	Feb 28 2019, 5:17 PM

Description

Background information

In order to save an article for offline viewing, the apps need to know of any related files that would need to be downloaded as well. There's currently an endpoint for media, but there should be an additional endpoint for any other related files that the article would need.

What

mobile-html-offline-resources endpoint would take a page title and revision and return a list of related scheme-less URLs

Details

	Subject	Repo	Branch	Lines +/-
	PCS: mobile-html-offline-resources endpoint	mediawiki/services/mobileapps	master	+115 -10

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T169242 Develop Page Content Service for Reading Clients
Resolved	None	T177425 Develop General Layer of PCS
Resolved	MSantos	T217349 mobile-html-offline-resources endpoint

Event Timeline

JoeWalsh created this task.Feb 28 2019, 5:17 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 28 2019, 5:17 PM

I've been thinking we could add a JS function in the page library which would get all the resources needing to be saved for offline.
It could even do the DOM transformation before writing the file to disk if the replacement is predictable. For reading while offline we could also do the same for the opposite way, which would get invoked after reading the stored page from disk.

Edit: Is there a way to go through a WebView when saving a page for offline from a link, similarly what you do when you cover up the WebView content during regular page load on iOS?

• bearND moved this task from Needs triage to Backlog on the Product-Infrastructure-Team-Backlog-Deprecated board.Feb 28 2019, 6:35 PM

• NHarateh_WMF subscribed.Mar 4 2019, 6:51 PM

@bearND I think the answer to that question would be no, unless I'm misunderstanding your question - when we save for offline, we don't interact with the webView at all. We initialize it when the user is about to view an article.

Bummer. That going to be tough. I don't think Android needs something like that since they use the same networking library (and interceptors) when they save for offline. Would it be possible to instantiate a hidden WebView somewhere to still get to run JavaScript DOM transformations?

Once T217348 is merged and this endpoint is implemented, we should be fine without additional DOM transformations for offline. I wrongly assumed that it would be necessary to swap out external links for local files in the html. It's not required if the links are schemeless and there's correct content security policy in place.

Is this what you were concerned about (transforming DOM to get the links right) or were you thinking about other transformations?

Great. That's much easier. (Yes, I was thinking about the DOM transformations you'd do when changing the URLs for reading and writing external links for offline use. Not sure if you do for both or just one way.)

• Jhernandez moved this task from Backlog to Needs triage on the Product-Infrastructure-Team-Backlog-Deprecated board.Mar 6 2019, 5:38 PM

• Jhernandez mentioned this in T201384: mobile-html for offline: versioned CSS and JS.

We need to explicitly list here all of what this endpoint will provide.

My current understanding based on the discussion we had today is that the output of this endpoint should include:

all linked CSS
all linked JS
all URLs of <img> tags (media endpoint only has a subset of images because that one is meant for gallery)
possibly links to video and audio files, too?

Every item in this list should have a mime type.

We could either:

expand the /page/media endpoint to include this since there is a flag for showInGallery but that would mean it includes non-media files, which seems bad, or
add a new endpoint

• NHarateh_WMF added a project: Page Content Service.Mar 8 2019, 2:30 PM

JoeWalsh added a comment.Mar 8 2019, 7:00 PM

This comment was removed by JoeWalsh.

I think /page/media/ could be removed and replaced with a unified endpoint.

It would return a list of related files, some of which have a "media" property with the same information as the old media endpoint:

[
  {
    "url": "//meta.wikimedia.org/api/rest_v1/data/css/mobile/base",
    "mime": "text/css"
  },
  {
    "url": "/meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pagelib",
    "mime": "text/javascript"
  },
  {
    "url": "https://upload.wikimedia.org/wikipedia/commons/d/d9/Collage_of_Nine_Dogs.jpg",
    "mime": "image/jpeg",
    "media": {
      "section_id": 0,
      "type": "image",
      "showInGallery": true,
      "titles": {
        "canonical": "File:Collage_of_Nine_Dogs.jpg",
        "normalized": "File:Collage of Nine Dogs.jpg",
        "display": "File:Collage of Nine Dogs.jpg"
      },
      "thumbnail": {
        "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Collage_of_Nine_Dogs.jpg/320px-Collage_of_Nine_Dogs.jpg",
        "width": 320,
        "height": 281,
        "mime": "image/jpeg"
      },
      "original": {
        "source": "https://upload.wikimedia.org/wikipedia/commons/d/d9/Collage_of_Nine_Dogs.jpg",
        "width": 1665,
        "height": 1463,
        "mime": "image/jpeg"
      },
      "file_page": "https://commons.wikimedia.org/wiki/File:Collage_of_Nine_Dogs.jpg",
      "artist": {
        "html": "...html here...",
        "text": "YellowLabradorLooking_new.jpg\nGolden_Retriever_Sammy.jpg\nCockerpoo.jpg\nLonghaired_yorkie.jpg\nBoxer_female_brown.jpg\nMilù_050.JPG\nBeagle1.jpg\nBasset_Hound_600.jpg\nNewfoundland_dog_Smoky.jpg"
      },
      "license": {
        "type": "CC BY-SA 3.0",
        "code": "cc-by-sa-3.0",
        "url": "https://creativecommons.org/licenses/by-sa/3.0"
      },
      "description": {
        "html": "perros",
        "text": "perros",
        "lang": "es"
      }
    }
  }
]

/page/media does a lot more than just listing resources. It has associated metadata for the resources themselves, and their relation with the page that embeds them, like the section id for example.

I think they serve very different use cases.

I agree with Bernd’s list above, with a definitely yes for video and audio.

The mime types are for letting clients decide if they want to save certain assets for offline. For example, skipping video could be an option.

We can also consider embedding that filtering logic on the service itself, but then the clients lose some flexibility to make choices, and they have more information about the device like network conditions and disk space to make those decisions.

Additionally we should investigate if it is possible to get the download size of the different assets to return for each entry. I believe it could be possible to make a HEAD request and check for the Content-Length, but it depends on who is serving the assets and if they actually include that information in that response.

Would it then make sense to leave the media endpoint as is and structure this endpoint in a way described by @JoeWalsh above?

If for media, we get url and mime first, should mime be stripped from thumbnail and original objects?

• NHarateh_WMF mentioned this in T206856: mobile-html: iOS prototype.Mar 13 2019, 12:32 PM

• Jhernandez triaged this task as High priority.Mar 13 2019, 3:48 PM

• Jhernandez moved this task from Needs triage to Needs investigation on the Product-Infrastructure-Team-Backlog-Deprecated board.

• Jhernandez added a parent task: T177425: Develop General Layer of PCS.

• Mholloway subscribed.Mar 21 2019, 4:16 PM

Assuming we want to keep the two endpoints separate, do we need anything else in the new mobile-html-offline-resources endpoint besides the following?

[
  {
    "url": "//meta.wikimedia.org/api/rest_v1/data/css/mobile/base",
    "mime": "text/css"
  },
  {
    "url": "//meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pagelib",
    "mime": "text/javascript"
  },
  {
    "url": "//upload.wikimedia.org/wikipedia/commons/d/d9/Collage_of_Nine_Dogs.jpg",
    "mime": "image/jpeg",
    }
  }
]

I wanted to also include an example for video and audio files but I think we need to discuss these a bit more. A mime type may not be the best solution for these since there can be many derivatives for video files and some for audio files. Even for certain images there could be derivatives. So, my question is should we move to using types instead of mime types?

@bearND assuming we're keeping both endpoints, the media wouldn't need to be returned by this endpoint and we wouldn't even need mime type - it could be gathered when making the request for the file to save for offline. The resulting response would be just a list of the css and js urls:

[
  "//meta.wikimedia.org/api/rest_v1/data/css/mobile/base",
  "//meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pagelib"
[

Ok, that should be quite easy then. I like the simplicity.

MSantos edited projects, added Product-Infrastructure-Team-Backlog-Deprecated (Kanban); removed Product-Infrastructure-Team-Backlog-Deprecated.Apr 17 2019, 3:48 PM

MSantos claimed this task.Apr 18 2019, 3:07 PM

MSantos moved this task from To Do to Doing on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.

@bearND and @JoeWalsh, from the description, are we still keeping revision and title as parameters?

mobile-html-offline-resources endpoint would take a page title and revision and return a list of related scheme-less URLs

Because of the simplicity of the endpoint, these parameters seem useless.

@MSantos the apps are requesting what resources are needed to render a given article and revision offline. The fact that the response is the same right now should be irrelevant to them - this way if anything changes in the future that would make the response different for different articles, the apps would be able to handle it without a client update.

Ack.

Change 504937 had a related patch set uploaded (by MSantos; owner: MSantos):
[mediawiki/services/mobileapps@master] PCS: mobile-html-offline-resources endpoint

https://gerrit.wikimedia.org/r/504937

gerritbot added a project: Patch-For-Review.Apr 18 2019, 6:32 PM

MSantos moved this task from Doing to Code Review on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.Apr 22 2019, 4:49 PM

MSantos moved this task from Code Review to Doing on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.Apr 30 2019, 3:32 PM

MSantos moved this task from Doing to Code Review on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.May 1 2019, 2:24 PM

Change 504937 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] PCS: mobile-html-offline-resources endpoint

https://gerrit.wikimedia.org/r/504937

MSantos moved this task from Code Review to To Deploy on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.May 2 2019, 1:41 PM

Maintenance_bot removed a project: Patch-For-Review.May 22 2019, 3:17 PM

MSantos moved this task from To Deploy to Sign off on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.Jun 18 2019, 10:17 PM

• bearND closed this task as Resolved.Jul 30 2019, 6:37 PM