Page MenuHomePhabricator

`media` endpoint stopped working as before
Closed, ResolvedPublic

Description

Over at Inuka-Team KaiOS-Wikipedia-app we are using this endpoint to get pages media:
https://en.wikipedia.org/api/rest_v1/page/media.

It was working as expected until now, currently this is the result I get:
https://en.wikipedia.org/api/rest_v1/page/media/C gives me a 200
https://en.wikipedia.org/api/rest_v1/page/media/Cat gives me a 404

questions needing answers:
1- @hueitan why are we using page/media if it's not present on the spec https://en.wikipedia.org/api/rest_v1/#/Page%20content
2- why was it working before and not now
3- what can we use instead

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

media api contains metadata like license, author, description and I don't find other alternatives does the same thing.

The media-list endpoint is the intended replacement for the media endpoint. After retrieving the media-list, clients should call the MW action API with action=query&prop=imageinfo to get additional information like license, author, and description. Here's an example request from the iOS app: https://commons.wikimedia.org/w/api.php?action=query&format=json&iiextmetadatafilter=License|LicenseUrl|LicenseShortName|ImageDescription|Artist&iiextmetadatalanguage=en&iiextmetadatamultilang=1&iiprop=url|extmetadata|dimensions&iiurlwidth=640&prop=imageinfo&rawcontinue=&titles=File:Volcán_Ubinas,_Arequipa,_Perú,_2015-08-02,_DD_50.JPG

There wasn't an announcement about the deprecation of the media endpoint, but we should fix that moving forward.

@dr0ptp4kt - should we start a new list for announcing changes to the Page Content Service or is there an existing list that would be better to use?

@JoeWalsh what was the endpoint stability of the media endpoint before it was decommissioned?

Agreed, let's do a postmortem and inspect process here as well. I'll file a task.

@hueitan and @Jpita apologies for any mixup. Does media-list get you what you need?

@dr0ptp4kt stability of the media endpoint before it was decommissioned was experimental

I'm curious to know why it stopped working now, was this a deploy on this weeks train or SWAT?

The media-list endpoint is the intended replacement for the media endpoint. After retrieving the media-list, clients should call the MW action API with action=query&prop=imageinfo to get additional information like license, author, and description.

This would work but it means a lot more network requests. It's not ideal on low-end devices.

Are there technical reasons why this is preferable for server implementation or other mobile apps?

@dr0ptp4kt stability of the media endpoint before it was decommissioned was experimental

That certainly should have been a red flag for us. We'll audit our other API usages and make sure the risk is understood by both teams.

Also, the media-list endpoint is also marked as experimental. Any plan to make it stable?

This would work but it means a lot more network requests. It's not ideal on low-end devices.

The requests can be batched. The example I sent only has one title in the titles param but you can provide multiple titles per request. According to the API sandbox info for titles: Maximum number of values is 50 (500 for clients allowed higher limits)

Are there technical reasons why this is preferable for server implementation or other mobile apps?

Given the current infrastructure, the information would frequently be out of date with no good way to update it when the properties changed. The PCS endpoints get updated when the article is edited, but there was no good way to propagate changes to metadata for a given file to update all articles that contain that file. It was also expensive to make the imageinfo request on every page edit when that information didn't necessarily change. @Mholloway knows more, so feel free to jump in here with corrections or a better explanation.

Also, the media-list endpoint is also marked as experimental. Any plan to make it stable?

Yes, this will be made stable soon as both the Android and iOS apps are about to release versions that rely on it.

Yes, @JoeWalsh is correct. The fatal problem with /page/media was that it was too heavyweight to serve without pre-generating and storing responses, and we lacked the dependency tracking mechanism needed to invalidate all relevant stored reponses in the event that a media item's metadata was changed. The cause of the performance issue is that it's very slow to parse extended unstructured metadata from File pages, compounded by the fact that the response often required such extended metadata for tens of files. (There is supposed to be some caching happening in MediaWiki to mitigate the performance problem, but I have a hard time believing it is working as intended, because the API response time does not improve over subsequent requests for extended metadata for the same files.)

What specific file metadata do you need, and in what context are you using it?

Thanks @JoeWalsh and @Mholloway for the detailed explanation and for the hint that we can batch the calls to the MW API.

Similar to other apps, we have a gallery view for users to browse the article image. We show the image and its caption (when present). There's an about button that opens a popup with description, author, and license. I'll file another task to discuss the options for how to redo it.

A couple of follow-up questions:

  • Is this currently affecting live users, or is the gallery feature still in development?
  • If it's live, how many users are affected?

If necessary (e.g., if this is causing user-facing errors), we could turn /page/media back on (without Cassandra storage) for the time being while you evaluate options and timeline for moving to newer and better stuff. (cc @Pchelolo as a heads-up)

On a side note, some if not all of these file metadata fields may now be taking advantage of Structured Data on Commons, which should be preferred over action=query&iiprop=extmetadata, though unfortunately I don't believe that the Wikibase APIs providing access to structured file metadata allow batched requests.

@Mholloway don't worry this is not in production yet.

SBisson claimed this task.

The basic gallery view has been restored using the media-list endpoint and a task has been created to fetch the metadata from MW API: T248611

I'm going to call this resolve. I don't see anything more that needs to be done here. Please reopen if I'm missing something.

Thanks all for being so responsive!

One final note: you may also want to follow T230845, which is a task to put a similar page media endpoint in the MediaWiki REST API as part of MediaWiki core.