Page MenuHomePhabricator

Document current MediaWiki thumbnail URL format & processing logic
Closed, ResolvedPublic

Description

Current URL format

There are four "API endpoints" for MediaWiki which can be used to request thumbnails:

  • thumb.php: used mostly when $wgGenerateThumbnailOnParse and transformVia404 are both false. Endpoint is at $wgThumbnailScriptPath (e.g. : https://commons.wikimedia.org/w/thumb.php).
  • direct/404 handler: this typically reflects how images are stored on disk, and MediaWiki relies on the web server (or something else like Varnish) to map URLs to files. With $wgGenerateThumbnailOnParse set to true, files will be pre-generated and this is not really an API. With that false and transformVia404 true, these URLs are mapped (again by the web server or some other external thing) to thumb_handler.php which is a wrapper around thumb.php that takes a nice URL and converts it into a parameter array. The URL format is <zone prefix>/[archive/|temp/][<first few characters of sha1>/]<filename> for originals, <zone prefix>/[archive/|temp/][<first few characters of sha1>/]<filename>/<param-string><maybe-filename><maybe-extension> for thumbnails. For example https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Camera2_mgx.svg/100px-Camera2_mgx.svg.png
    • <maybe-filename> is the filename, or just thumbnail if that would be too long; <maybe-extension> is the extension if it is different from the original; <zone prefix> is whatever is returned by FileRepo::getZoneUrl($zone) (which can be configured via $wgLocalFileRepo / $wgForeignFileRepos). For Wikimedia wikis this is https://upload.wikimedia.org/<project>/<language>/ for originals and https://upload.wikimedia.org/<project>/<language>/<zone> for most other stuff (most notably zone is `https://upload.wikimedia.org/<project>/<language>/thumb for thumbnails). The number of sha1 characters to put in the path is also configurable via the repos.
    • for old versions of files (which are under /archive, or /temp if they are also temp files) the file name takes the form of <YYYYMMDDHHMMSS timestamp>!<original filename>. The timestamp is included in the sha1 calculation for normal files, but nor for temp files.
  • img_auth.php: used by private wikis to check for permissions. It will return thumbnails if they exist but will not render them. URLs are more or less in the form <$wgScriptPath>/img_auth.php/<filename> or <$wgScriptPath>/img_auth.php/<thumbname> but can be extended via $wgImgAuthUrlPathMap or the ImgAuthBeforeStream hook.
  • upload stash: shows uploaded-but-not-yet-published files of the current user. Available via the special page Special:UploadStash/file/<stash key> for originals, Special:UploadStash/file/<stash key>/<thumb name> for thumbs. Stash key is returned by the upload API; thumb name is the last path segment for the normal (404 handler) thumb URL. Depending on $wgUploadStashScalerBaseUrl might actually render thumbnails or just proxy to the 404 handler.

Also, Special:Redirect and Special:FilePath can take a width parameter and redirect to the appropriate thumbnail URL.

Parameter handling

Processing of 404 handler ULRs (the other endpoints do some subset of this):

  • for the 404 handler, convert the URL into parameters (thumb.php just uses the query params):
    • f => the filename (penultimate path segment)
    • thumbName => the thumbnail name, with parameters (the last path segment)
    • archive/temp => flags for this being the thumbnail of an old version or temporary file
    • rel404 => the whole path, more or less
  • normalize some params (BC): w => width, h => height, p => page
  • add params from ExtractThumbParameters hook (deprecated)
    • PagedTiffHandler (BC only): lossy, page, width
    • OggHandler (discontiued in MW 1.24): seek
  • unset thumbName, add params from MediaHandler::parseParamString(<param-string>)
    • ImageHandler: width
    • BitmapHandler: interlace
    • JpegHandler: quality
    • SvgHandler: lang, width
    • DjVuHandler: page, width
    • PdfHandler: page, width
    • PagedTiffHandler: lossy => [lossy|lossless], page, width
    • PdfHandler: page, width
    • MP3MediaHandler: does not take any params
    • TimedMediaHandler: width, seek
  • unset thumbName/archived/temp/f/rel404/r (but use them to select the right repo/file)
  • handle 'download' parameter
  • call File::transform with the collected parameters

Other parameter-handling-related code

  • ApiQueryImageInfo/ApiQueryStashImageInfo use MediaHandler::parseParamString for the urlparam API parameter
  • MediaHandler::makeParamString: inverse of MediaHandler::parseParamString - turn parameter array into a thumb URL filename prefix
  • MediaHandler::normaliseParams: idempotent post-processing on the parameter array returned by parseParamString, maybe abort processing. In the core/gerrit handlers it's used to change invalid width/height/page values, set physicalWidth/physicalHeight, validate the interlace parameter, and abort if various other parameters have unexpected values.
  • MediaHandler::validateParam: this is documented as a check for wikitext image markup parameters, which are not necessarily the same as thumb.php URL parameters, but ApiQueryImageInfo seems to use it to validate the return of parseParamString. Not sure what's up with that.

Event Timeline

Tgr created this task.Dec 16 2016, 9:31 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 16 2016, 9:31 PM
Restricted Application added projects: Multimedia, Commons. · View Herald TranscriptDec 16 2016, 9:33 PM
Tgr updated the task description. (Show Details)Dec 18 2016, 3:18 AM
Tgr updated the task description. (Show Details)
Tgr updated the task description. (Show Details)Dec 18 2016, 3:41 AM
Tgr updated the task description. (Show Details)Dec 18 2016, 4:42 AM
MarkTraceur moved this task from Untriaged to Tracking on the Multimedia board.Dec 22 2016, 5:23 PM
Tgr closed this task as Resolved.Jan 28 2017, 1:45 AM
Tgr claimed this task.

Has this been put on wiki somewhere?

Tgr added a comment.Jan 30 2017, 6:12 PM

No. Should it? It was intended as documentation for the RfC.

I think there's lasting value in it, yes. If you don't want to do it, I'll probably recycle that content on wiki once I do a thumbnailing stack documentation sprint after Thumbor is done.

Tgr added a comment.Feb 4 2017, 8:53 PM

I'm not actively opposed to it, just don't have the time right now. Feel free to poke me in a few weeks if it is not urgent.