Page MenuHomePhabricator

Implement get media file endpoint
Open, MediumPublic5 Estimated Story Points

Description

Description
This endpoint allows readers to retrieve specified media files as described in T230848.

Requirements

  • Implement endpoint described by T230848
  • Implement integration test covering expected behaviour
  • Add documentation for endpoint

Event Timeline

WDoranWMF set the point value for this task to 5.Oct 30 2019, 10:46 PM

Change 552343 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Add a core REST API endpoint for media file metadata

https://gerrit.wikimedia.org/r/552343

Change 552343 merged by jenkins-bot:
[mediawiki/core@master] Add a core REST API endpoint for media file metadata

https://gerrit.wikimedia.org/r/552343

Hi @BPirkle, Here are the docs for this endpoint for your review: https://www.mediawiki.org/wiki/API:REST_API/Media_files

I've also put together a draft of a Jupyter Notebook demonstrating the use of this endpoint: https://paws-public.wmflabs.org/paws-public/User:APaskulin_(WMF)/en-wikipedia-images.ipynb

Let me know what you think. Feel free to edit the pages directly, provide feedback via this task or the doc talk page, or let me know and I can schedule a synchronous chat if that's easier.

Hi @apaskulin ,

This looks great! A few comments follow.

  1. Upon reading this and rereading some of the other endpoint documentation, I notice that we're using a different example parameter between the "Request examples" section and the "Request parameters" section. For example, on the /file endpoint documentation the "Request examples" section says:

curl https://commons.wikimedia.org/w/rest.php/v1/file/File:The_Blue_Marble.jpg

While the "Request parameters" section includes this table:

parameterrequiredexampledescription
titlerequiredMy fileFile title

Is it intentional that the first uses "File:The_Blue_Marble.jpg" while the second uses "My file"? I notice on some endpoints (ex https://www.mediawiki.org/wiki/API:REST_API/History) there are multiple examples with different parameters (which is great!) so maybe the intention is to be consistent across endpoints in using an actual, working parameter for the "examples" section and a true placeholder one for the table?

What made me catch on this for this example was that I was expecting to see something like "File:foo" in the table, but instead I saw a string without the file namespace specified, and containing a space. Maybe "File:My_file" for the table?

  1. in the 403 response section, the message will say "The user does not have rights to read title" Note the "s" at the end of "right", which is omitted on the doc page. I noticed this same discrepancy on other endpoints - I assume it was copied and pasted. We may have changed this message string in the code you created the initial draft of the docs. This message string is used in the code for:
    • MediaFileHandler (/file)
    • LanguageLinksHandler (/page/{title}/links/language)
    • MediaLinksHandler (/page/{title}/links/media)
    • PageHTMLHandler (/page/{title}/{html_type}
    • PageHistoryCountHandler (/page/{title}/history/counts/{type}
    • PageHistoryHandler (/page/{title}/history)
    • PageSourceHandler (/page/{title})
  1. In the preferred/original/thumbnail section, I suggest changing the word "applicable" to "available". Background: some of these fields are a little less predictable than I'd like. The way this technically works is that mediawiki core defers creation of the actual scaled image on disk until it is needed. So for example, the "thumbnail" file may not exist until someone asks for it. This means that some fields (notably "size") may or may not have a non-null value depending on the state of the filesystem and who has hit which links. So the "size" of a thumbnail is very applicable for that image type, and callers may wish they had that value. But if we don't have it available, we won't be able to send it.

Thank you so much for this thoughtful and detailed feedback! I've applied the suggestions here.

Looks great, as does the corresponding change mentioned on T236169! How do I +2 a doc change? ;-)

  • I would add at least mimetype. It's more important than mediatype (which is honestly more an internal concept that means very little, other than in search and filter interfaces where we do maintenance).
  • If you add thumbnails in this, then you need to provide more parameters to determine which thumbnail you want returned
    • What determines the size of the thumbnail in the thumbnail section ? Considering we have images over 18000px wide but also of 1px.
    • SVGs can be multilingual and you need to tell it what language to render in
    • Similar concerns for multipage images like DJVU and TIFF
    • Video timecode offset of the frame of which you want a thumbnail (aka poster)
  • Audio/Video needs things like derived transcodes, subtitles etc.
  • What is 'preferred' (nvmd, found the explanation)

I'd consider any type of thumbnail as a 'derived' version, of which there can be multiple (and many now tend to come with custom HTML requirements and rendering modules).

We should attempt to get this right, because it's a bit of an organically grown mess in the action api.

Maximum recommended image width in pixels

What does this mean for infinitely scalable resources like SVG ?

The size of SVG originals refers to their 'preferred' size (as indicated in the svg xml root), but max'ed to a certain amount of pixels.

And thumbnails can scale both up and down from original sizes of images (there is no real limit)

Another potential issue i see is the combination of title, file_description_url and commons shared files...
On a local wiki, your image will be Datei:Example.png. But when hosted on Commons, the description url will be //commons.wikimedia.org/wiki/File:Example.png

Since there is no 'imagerepository' indicator, nor a canonicaltitle like in the action api, that means if you want to create a wgTitle or something, you now need to make interpretations about the correct form of the namespace: 'Datei' or 'File'. This might not sound like a big deal 'simply use canonical namespace names everywhere', but you'll find that local communities care a lot about their localizations.

I'd consider either adding 'pagename' (analogous to wgPageName) or a namespace param in the response.. assuming you don't want 'imagerepository', which is 'scary' :D

BPirkle subscribed.

Adding missing MediaWiki-REST-API code project tag as Core Platform Team Initiatives (MW REST API in PHP) team tag is archived and its parent Platform Engineering team does not exist anymore