- Affected components: TBD.
- Engineer for initial implementation: TBD.
- Code steward: TBD.
Motivation
(Define the problem you are seeking to solve.)
Requirements
(Specify the requirements that a proposal should meet.)
- …
Exploration
This task was split out of T66214, as establishing an API for thumbnails is more pressing than moving to content hash based thumb identifiers. The thumbnail API can accommodate either without too much trouble, which lets us tackle the move to content hash based addressing in a second phase.
Identifying thumbs by content hash instead of human-readable names
Content hash based URLs for media files and thumbnails have some advantages over the current pretty names:
- automatic cache busting
- consistency of HTML revisions and media referenced in it, in particular in old revisions (important for HTML storage and Parsoid)
- natural content-based deduplication
- content-based image blocking (bad image lists etc)
- media renames don't trigger HTML updates
- simplifies a potential migration of all media content to commons
There are also some disadvantages:
- need to use Content-disposition header to suggest pretty name for image saving
- need to think about quick image purging for copyvio cases, as cache busting is not enough there
- applying of access restrictions is more complicated, as it needs to query all image-revisions referring to the hash and choose which to apply (likely "least-restrictive restriction wins")
- media edits (i.e. uploading a new version) do trigger HTML updates
- use of hash collisions for vandalism, should the chosen hash mechanism turn out to be susceptible to practical preimage attacks and reuploads of the same content are allowed (which may be desirable to allow easily fixing data corruption)