RFC: Use content hash based image / thumb URLs
Open, MediumPublic
Actions

Description

Affected components: TBD.
Engineer for initial implementation: TBD.
Code steward: TBD.

Motivation

(Define the problem you are seeking to solve.)

Requirements

(Specify the requirements that a proposal should meet.)

Exploration

This task was split out of T66214, as establishing an API for thumbnails is more pressing than moving to content hash based thumb identifiers. The thumbnail API can accommodate either without too much trouble, which lets us tackle the move to content hash based addressing in a second phase.

Identifying thumbs by content hash instead of human-readable names

Content hash based URLs for media files and thumbnails have some advantages over the current pretty names:

automatic cache busting
consistency of HTML revisions and media referenced in it, in particular in old revisions (important for HTML storage and Parsoid)
natural content-based deduplication
content-based image blocking (bad image lists etc)
media renames don't trigger HTML updates
simplifies a potential migration of all media content to commons

There are also some disadvantages:

need to use Content-disposition header to suggest pretty name for image saving
need to think about quick image purging for copyvio cases, as cache busting is not enough there
applying of access restrictions is more complicated, as it needs to query all image-revisions referring to the hash and choose which to apply (likely "least-restrictive restriction wins")
media edits (i.e. uploading a new version) do trigger HTML updates
use of hash collisions for vandalism, should the chosen hash mechanism turn out to be susceptible to practical preimage attacks and reuploads of the same content are allowed (which may be desirable to allow easily fixing data corruption)

Related Objects
Search...

Status	Subtype	Assigned	Task
Declined		• mobrovac	T115876 High-traffic API endpoints to cover in RESTBase
Invalid		• Pchelolo	T116840 Cached REST end point for imageinfo requests
Resolved		• brooke	T3780 Can't upload file with non-ASCII name (eg cyrillic) on Windows host
Open	BUG REPORT	None	T37721 Moving files breaks hotlinks to original file asset
Open		None	T139294 Persistent media links for file versions
Stalled		None	T192571 Remove need for client side thumbnail handling in Popups by using a thumb api
Open		None	T66214 Define an official thumb API
Resolved		Dereckson	T191572 Please upload large file to Wikimedia Commons
Declined	Request	None	T382859 Server-side upload request for Koavf
Open		None	T191802 [Epic] Determine a strategy to store files between 5 and 100 GB
Declined		None	T125920 [EPIC] Future exciting reading web performance endeavours
Stalled		None	T19577 Thumbnail urls should be versioned and sent with Expires headers
Open		None	T149847 RFC: Use content hash based image / thumb URLs

Event Timeline

• GWicke created this task.Nov 2 2016, 9:12 PM

• GWicke removed • brooke as the assignee of this task.Nov 2 2016, 9:32 PM

• GWicke updated the task description. (Show Details)

FYI I've sort of implement a solution for this on Vagrant a while ago for the current thumbnail URI scheme, by replacing the second instance of the file name in the URI with the SHA1 of the original.

Eg. http://127.0.0.1:6081/images/thumb/d/d7/Munich_subway_station_Westfriedhof2.jpg/800px-Munich_subway_station_Westfriedhof2.jpg becomes http://127.0.0.1:6081/images/thumb/d/d7/Munich_subway_station_Westfriedhof2.jpg/800px-2zeso4ug3i3dsai23sd7thu7kdnndiq.jpg

It only causes minor breakage for code in the wild that consumed the second name when it needed the original's file name, which has to be updated to read the first occurrence instead.

The option that makes that happen is $supportsSha1URLs on the FileRepo

ori moved this task from Inbox, needs triage to Blocked (old) on the Performance-Team board.Nov 3 2016, 9:46 PM

• Gilles moved this task from Blocked (old) to Radar on the Performance-Team board.Nov 3 2016, 9:48 PM

• GWicke mentioned this in T66214: Define an official thumb API.Nov 4 2016, 9:18 PM

• ema moved this task from Backlog to Caching on the Traffic board.Nov 7 2016, 11:36 AM

matmarex mentioned this in T150113: Uploading a newer version of a file; previous version cached by the browser, shown stretched to width/height of the new version.Nov 7 2016, 8:01 PM

matmarex mentioned this in T38380: After re-uploading a file, users still see the browser-cached thumbnail for the old version.Nov 7 2016, 8:10 PM

Dbrant moved this task from Needs Triage to Tracking on the Wikipedia-Android-App-Backlog board.Nov 9 2016, 2:51 PM

• JMinor moved this task from Needs Triage to Tracking on the Wikipedia-iOS-App-Backlog board.Nov 10 2016, 7:16 PM

MarkTraceur moved this task from Untriaged to Desired epics on the Multimedia board.Nov 16 2016, 7:06 PM

daniel moved this task from P1: Define to Old on the TechCom-RFC board.Nov 30 2016, 9:34 PM

Arlolra moved this task from Needs Triage to Non-Parsing-Team Tasks on the Parsoid board.Dec 2 2016, 7:34 PM

Tgr mentioned this in T153565: MediaWiki file operations are fragile, causing occasional data loss.Dec 18 2016, 1:08 AM

NicJansma mentioned this in T19577: Thumbnail urls should be versioned and sent with Expires headers.Jan 19 2017, 4:09 AM

MarkTraceur unsubscribed.Jan 27 2017, 10:16 PM

MarkTraceur subscribed.Jan 27 2017, 10:51 PM

Krinkle removed projects: TechCom-Has-shepherd, Services-next.Feb 1 2017, 9:38 PM

Krinkle removed a project: Performance Issue.Jun 14 2017, 6:41 PM

• GWicke edited projects, added Services (later); removed Services (next).Jun 20 2017, 2:57 PM

dr0ptp4kt moved this task from Backlog to Infrastructure on the Reading-Admin board.Jul 20 2017, 9:41 PM

zhuyifei1999 moved this task from Incoming to Thumbnail and file renderings on the Commons board.Jul 25 2017, 10:06 AM

zhuyifei1999 subscribed.

Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.Aug 8 2017, 3:15 AM

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.Aug 16 2017, 7:57 PM

Note to future selves: we'd probably want the filename to contain the human readable File: name so people and machines can generate search results more easily.

cscott mentioned this in T164655: Store and serve annotations in W3C standard format.Oct 13 2017, 4:54 PM

Krinkle removed a project: Proposal.Dec 21 2017, 11:38 PM

Chicocvenancio subscribed.Feb 15 2018, 4:00 PM

• bearND mentioned this in T188636: [Analysis] Image size too large on previews.Mar 7 2018, 5:39 PM

Krinkle renamed this task from Use content hash based image / thumb URLs to RFC: Use content hash based image / thumb URLs.Mar 21 2018, 9:03 PM

aaron added a parent task: T191802: [Epic] Determine a strategy to store files between 5 and 100 GB.Apr 12 2018, 5:34 PM

aaron mentioned this in T191802: [Epic] Determine a strategy to store files between 5 and 100 GB.

Removing from perf radar in favour of T19577.

• mobrovac added a project: Platform Team Legacy (Later).Dec 20 2018, 12:07 PM

MaxSem removed a project: Zero.Jan 3 2019, 11:16 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 8:54 PM

LGoto removed a project: Wikipedia-Android-App-Backlog.Aug 27 2019, 8:14 PM

Krinkle unsubscribed.Aug 27 2019, 8:16 PM

Tgr mentioned this in T244712: Allow users to query mediarequests using a file page link.Feb 10 2020, 8:24 PM

dr0ptp4kt moved this task from Infrastructure to Tracking on the Reading-Admin board.Mar 18 2020, 6:41 PM

Krinkle updated the task description. (Show Details)Apr 3 2020, 9:55 PM

Krinkle moved this task from Old to P1: Define on the TechCom-RFC board.

Tgr mentioned this in T249419: RFC: Render data visualizations on the server.Apr 15 2020, 8:14 PM

• Demian subscribed.May 21 2020, 9:28 AM

AntiCompositeNumber mentioned this in T260272: Thumbnails for a specific .xcf file give "Error: 429, Too Many Requests".Aug 12 2020, 6:54 PM

AntiCompositeNumber subscribed.Aug 12 2020, 7:02 PM

Aklapper removed subscribers: • Fabrice_Florin, • Tbayer, • BGerstle-WMF and 4 others.Oct 16 2020, 5:02 PM

MBinder_WMF edited projects, added Parsoid (Tracking); removed Parsoid.Dec 10 2020, 8:12 PM

Meow moved this task from Thumbnail and file renderings to Incoming on the Commons board.Jul 28 2021, 9:05 AM

Meow moved this task from Incoming to Thumbnail and file renderings on the Commons board.Jul 28 2021, 9:26 AM

BBlack moved this task from Caching to Icebox-Temp on the Traffic board.Oct 8 2021, 5:27 PM

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

LGoto removed a project: Wikipedia-iOS-App-Backlog.Mar 22 2022, 8:12 PM

BBlack moved this task from Backlog to Close or Untag? on the Traffic-Icebox board.Apr 7 2022, 9:10 PM

BCornwall removed a project: Traffic-Icebox.Mar 29 2023, 6:27 PM

Ladsgroup removed a project: SRE.Mar 31 2023, 1:22 AM

dr0ptp4kt unsubscribed.Jul 28 2023, 3:31 PM