Develop a JSON API that returns all media for a page.
This includes: page images, audio files, and media files.
## General Logic
---
### Commons Info
[x] For each item, we should be querying commons API to get licensing, captions, file page url, author, mime-type etc…
Open question: For mime type: is it possible to have different mime types per item? Do we need to have sub dictionaries for mime types?
For example: for images is it possible that the thumb and original are 2 different mime types?
### Wikipedia Captions (Or original Wiki caption)
[x] In addition to the commons caption, we should also return the Wikipedia caption that is in the page where the image is located.
### Section Information
[ ] We should return the section that each object is contained. **Status: blocked on Parsoid sections deployment** (T114072)
### Media Type
[x] Each item in the array should be given a type to denote what kind of object it is. Proposed types: "Image", "Video", "Audio"
### Extended metadata
[x] extmetadata items of interest appear in the top level
### Misc
[ ] Specially handle items with links pointing somewhere besides the file page (T182329)
[x] Return page_count for paged file types (PDF, TIFF, DjVu...)
## Type specific logic and metadata
---
### Images
This is essentially replacing the client side logic of:
- parsing out image urls
- rewriting urls to get both original and thumb sizes
- Identifying images that are too small for a gallery
[x] The API should return an array of image objects (with thumb and full sizes) like in other MCS APIs with the other information included.
##### Images that are too small for the gallery
[x] We should continue to employ the current filtering logic.
[ ] Add mw-gallery markup with grouping **TODO: create subtask**
[x] Add filtering for noviewer, metadata classes
### Audio/Video
[x] We should return duration.
[x] If start and end times are specified, we should return them
[x] Poster (image) for the video shows the time of the image
Open question: anything else?
##### Audio special cases
[x] We should mark pronunciations as a special kind of media file
[x] We should mark Spoken Wikipedia audio files as a special kind of file