Develop a JSON API that returns all media for a page.
This includes: page images, audio files, and media files.
See also: https://www.mediawiki.org/wiki/Specs/HTML/1.6.0
General Logic
Commons Info
- For each item, we should be querying commons API to get licensing, captions, file page url, author, mime-type etc…
Open question: For mime type: is it possible to have different mime types per item? Do we need to have sub dictionaries for mime types?
For example: for images is it possible that the thumb and original are 2 different mime types?
Wikipedia Captions (Or original Wiki caption)
- In addition to the commons caption, we should also return the Wikipedia caption that is in the page where the image is located.
Section Information
- We should return the ID of the section that contains each item.
Media Type
- Each item in the array should be given a type to denote what kind of object it is. Proposed types: "Image", "Video", "Audio"
Extended metadata
- extmetadata items of interest appear in the top level
Misc
- Return page_count for paged file types (PDF, TIFF, DjVu...)
Type specific logic and metadata
Images
This is essentially replacing the client side logic of:
- parsing out image urls
- rewriting urls to get both original and thumb sizes
- Identifying images that are too small for a gallery
- The API should return an array of image objects (with thumb and full sizes) like in other MCS APIs with the other information included.
Images that are too small for the gallery
- Stop filtering images based on file type (SVG, PNG).
- Start filtering image based on size on the page (reject if width or height < 64px).
- Add mw-gallery markup with grouping (T182330)
- Add filtering for noviewer, metadata classes
Audio/Video
- We should return duration.
- If start and end times are specified, we should return them
- Poster (image) for the video shows the time of the image
- Add captioning info (exposed via <track> elements? (created T185263 to discuss)
Open question: anything else?
Audio special cases
- We should mark pronunciations as a special kind of media file
- We should mark Spoken Wikipedia audio files as a special kind of file