"Like" token, awarded by Veikk0.ma."Baby Tequila" token, awarded by Imarlier."Love" token, awarded by Liuxinyu970226."Love" token, awarded by bearND."Love" token, awarded by MichaelSchoenitzer."Orange Medal" token, awarded by Krinkle."Love" token, awarded by Fjalapeno."Baby Tequila" token, awarded by Mholloway."Barnstar" token, awarded by mmodell.

Description

Use cases / problem statement

Dynamic client-side thumb size / quality selection without extra round-trips: Clients would like to adapt image size and quality to actual network and device characteristics. Most current techniques for lazy-loading and dynamic resource selection are using JS. Having the ability to select the thumb size and quality in this code without extra round-trips would enable interesting performance optimizations.

Caching of API responses referencing thumbnails: Many cached API responses (such as summary or related pages) contain references to thumbnails. Currently, there is no way for clients to sanely select thumb sizes, which means that we either need to fragment caches on thumb size, or try to include several sizes in the response. Good caching of API responses is becoming more pressing as high-traffic features like hovercards (see T70860) are built as direct API consumers. The need to negotiate thumb sizes supported by API end points introduces delays, and reduces the design flexibility in clients.

Image info caching: Parsoid and other clients like VisualEditor currently need to request image information in order to render thumbnails. Those requests are very common, making imageinfo one of the most-used API entry points. Currently, the request necessarily contains the desired thumb size, which renders caching of imageinfo responses ineffective. If the response was dimension-independent instead, most of these responses could be served directly from caches. This would reduce response latency and load on the API cluster and related infrastructure.

API requirements

Simple selection of thumb size and -quality without a need for extra API calls.
Avoid cache fragmentation with deterministic URLs.
Support for encoding complex options in a uniform and extensible manner, without breaking existing use cases or introducing non-determinism.
Optional support for content negotiation (ex: client hints) in the future.
Support migrating to hash-based image identification in a later stage.

API proposal: Use query strings

/v1/someimage.jpg: Original. Returned by API end points referencing thumbnails.
/v1/someimage.jpg?w=220: JPG thumb 220px wide.
/v1/someimage.jpg?p=22&w=500: 500px thumb of page 22 in a multi-page document.
/v1/someimage.jpg?t=2m30s&w=220: Thumb of a video at 2m30s.
/v1/someimage.jpg?lang=fr&w=220: Thumb of an SVG, rendered to a PNG using French texts. We aren't explicitly mentioning the file format, so the client does not need to know that the original is an SVG, or that it is rendered to a PNG image.

Server side requirements:

Audit and document the existing query string API in thumb.php. T153497
Add strict parameter validation. Each thumb should have only a single URL. Don't allow unknown parameters, and (generally) avoid specifying default values explicitly. Exceptions can be made for page & time offset parameters, where little actual fragmentation is expected, and consistency in the use of the parameter is important.
Query string order normalization in Varnish (vmod): T138093

Providing original dimensions to clients

In order to accurately calculate and select thumbnail dimensions, clients need to know the original image's dimensions (where applicable). Using this informations, clients can then construct a unique URL for a given thumbnail size, independent of the constraints it applied to select this size. They can also avoid content jumping around by updating image dimensions to the exact thumbnail dimensions, before the thumbnail has loaded.

Currently, MediaWiki already provides original dimensions in data-file-{width,height} attributes for MediaViewer's benefit. Some API responses referencing thumbnails include the equivalent information in JSON:

image: {
  src: "/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12",
  width: 640,
  height: 480
}

We can either stick to these different formats, or consider unifying this information in the URL, either in query parameters (/someimage.jpg?oh=768&ow=1024&w=100), or separately in a fragment (/someimage.jpg?w=100#oh=768&ow=1024). The latter avoids sending back purely informational parameters to the server.

Pros

Familiar query string syntax with wide parsing support.
Does not distinguish between advanced & frequently changed properties.

Cons

Requires custom ordered query string serialization code for both simple (size, quality) & more complex use cases.
- Subtlety of ordering requirement (still works) means that users will often ignore it, causing client side cache fragmentation.
Need for general query string normalization in Varnish. Weak con, as this would be generally useful.

Options for content negotiation and -selection

We would like to use modern thumbnail formats where this has a benefit for users, but need to make sure that we don't break older clients with insufficient support in the process.

The two main approaches for this content negotiation process are:

a) Server-side HTTP content negotiation, using client supplied headers like accept, and
b) client-side JS explicitly requesting specific formats.

There are pros & cons to either method, and they aren't mutually exclusive. HTTP content negotiation can work without any client-side effort for bitmap formats (ex: Chrome advertises WebM support). Client side JS can add explicit parameters for more fine-grained control, but also has only limited information to base such decisions on.

Either way, the status quo and starting point is to serve widely supported formats (JPG and PNG) by default. We don't need to solve this question right away, and the proposed API is leaving all options for content negotiation open.

Deployment strategy

A change like this isn't complete without a strategy that allows us to roll out new-style thumbs gradually. To avoid performance impacts, all (simple) requests for a thumb of a given size should map to the same cache entries using either URL scheme. To this end, we can roll things out in a way that lets us *rewrite* one URL scheme to the other.

New to old style:
- Simple thumbs (only width parameter specified):
  1. Prefix the image name with the width parameter value followed by 'px-'.
  2. Calculate the MD5 of the image name, and prefix first & first two chars of hex encoding to path (ex: /7/72/).
- Complex thumbs: Let PHP code handle the request & cache the response separately.
Old to new style:
- Simple thumbs (need to check if we can determine this with a regex):
  1. Extract & strip width parameter
  2. Send a request with a query string to backend
- Complex thumbs: Let PHP code handle the request & cache the response separately.

Users / apps making assumptions about the current thumb URL format

See T153498.

Migration strategies in Varnish

Feasibility of rewriting majority of "simple" thumbnails (no key-value parameters) in Varnish.
Feasibility of avoiding redirect latency penalty by resolving redirect responses from thumb service in Varnish.

Small MediaWiki installs

Currently, MediaWiki defaults to serving thumbnails directly from an upload directory. This means that there is no PHP code involved in serving thumbnails. This is good for performance (especially without caching), but also means that on-demand generation & a parameter-based API cannot be supported out of the box. There are several options we can pursue:

Start to serve all thumbs through thumb.php (or an API module). The migration to this would generally be easy (especially with the API), but we would add the overhead of serving thumbs through PHP. Authentication would be supported out of the box. Caching could eliminate the performance issue for higher volume installs, or in an appliance container install that includes Varnish.
Create a new way of supporting direct file serving with 404 handler and storage based on encoded query strings. This requires an advanced web server configuration, and might not be possible with less common web servers.

At this point, the default option would be 1), but we can always optimize the setup with 2). We can provide 2) by default in a container-based distribution solution.

Details

Reference: bz64214

Related Objects
Search...

Status	Subtype	Assigned	Task
Declined		• mobrovac	T115876 High-traffic API endpoints to cover in RESTBase
Invalid		• Pchelolo	T116840 Cached REST end point for imageinfo requests
Resolved		• brooke	T3780 Can't upload file with non-ASCII name (eg cyrillic) on Windows host
Open	BUG REPORT	None	T37721 Moving files breaks hotlinks to original file asset
Open		None	T139294 Persistent media links for file versions
Stalled		None	T192571 Remove need for client side thumbnail handling in Popups by using a thumb api
Open		None	T66214 Define an official thumb API
Resolved		None	T120544 Package bloomd
Open		None	T149847 RFC: Use content hash based image / thumb URLs
Invalid		None	T150673 Thumb API: Varnish / CDN questions
Resolved		Tgr	T153498 Document current clients which use thumb URLs as an API
Resolved		Tgr	T153497 Document current MediaWiki thumbnail URL format & processing logic

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.Aug 16 2017, 6:55 PM

SamanthaNguyen unsubscribed.Sep 18 2017, 6:37 PM

Anomie mentioned this in T182475: Handling of structured data input in MediaWiki APIs.Dec 11 2017, 5:27 PM

Krinkle removed projects: TechCom-Has-shepherd, Proposal, Services-next, RESTBase-API.Dec 21 2017, 11:44 PM

Krinkle moved this task from In progress to Under discussion on the TechCom-RFC board.Dec 22 2017, 12:47 AM

• bmansurov unsubscribed.Dec 22 2017, 9:54 PM

• ssastry moved this task from Needs Triage to Non-Parsing-Team Tasks on the Parsoid board.Jan 11 2018, 9:49 PM

Chicocvenancio subscribed.Feb 15 2018, 4:00 PM

phuedx mentioned this in T187955: Page preview shows icon instead of thumbnail.Feb 22 2018, 5:11 PM

Tgr mentioned this in T91104: PHP thumbnailer as a service.Feb 23 2018, 2:13 AM

Perhelion subscribed.Mar 2 2018, 4:25 AM

• bearND mentioned this in T188636: [Analysis] Image size too large on previews.Mar 7 2018, 6:37 PM

TechCom is declining because the use case is not current. This needs a new owner and use case.

There is actually one current use case still , which is the idea of having the thumb urls be versioned instead of “current” (ref. Performance-Team), but we intend to address that at a later point with a different proposal.

The current thumbnail URL scheme could easily start including a revision number or sha1 of the original without changing the format.

In T66214#4064480, @kchapman wrote:

TechCom is declining because the use case is not current.

Everything listed in the task's description under "Use cases / problem statement" seems current to me. See also T66214#1842437.

Thanks @Anomie my information might be old. Moving to TechCom-RFC Inbox for discussion.

jeremyb subscribed.Mar 21 2018, 8:38 PM

• bearND awarded a token.Mar 21 2018, 9:33 PM

TechCom discussed this at our last meeting. The problem statement is still valid, but that doesn't mean it needs to be kept open as there is currently no resourcing for this. If there is still interest in this issue it could be used as material for a new RFC but the new RFC should contain one problem statement. Note: T149847: RFC: Use content hash based image / thumb URLs has already been broken out into a single issue.

We are moving this to last call to be declined closing on 2018-03-29 at 1 pm PST(21:00 UTC, 22:00 CET

@kchapman we are interested in picking this up in Reading Infrastructure, but haven't been able to get to it. We would still like to do this if we can find some time… I should know more in Q4 about feasibility/timing.

For context: We have lots of client code with work arounds for getting the right sizes of images. So much duplication and bugs. We want to get rid of this code in the client and instead use this much more flexible proposed API.

The problem here is that TechCom is using Phabricator in a different way from the rest of the movement.

The normal way is that you create one task for one concept / task, and multiple groups share that task and do their workflow management in such a way that it does not conflict with that of other groups. That means using projects or workboards (since there can be any number of those but there is only one task status). Declining a task means that it was decided that it should not be done (ie. people should be actively prevented from doing it) because it is a bad idea. Most tasks are not resourced but kept open (or stalled) nevertheless.

So if TechCom insists on their current workflow, it should make it clear that it prefers a different workflow, and for every idea that goes through the TechCom process there should be a separate idea task and a separate RfC task so that the TechCom can decline the RfC task when there is no resourcing, while the idea task can be kept open as something that's potentially still valid; and the existing RfC tasks should be split in two. Or TechCom should change their workflow to match that of everyone else on Phabricator, and use a workboard column or the removal of the TechCom tag or something similar for tracking "rejected without prejudice".

In either case please keep this task open, whether it gets resourced in the near future or not. The use cases described here are still valid, the solution proposed here still captures our best understanding of the solution space, declining it would be confusing and hamper the use of Phabricator as a technical knowledge management system.

Liuxinyu970226 awarded a token.Mar 25 2018, 7:39 AM

@Tgr perhaps I was not as clear as I could have been. The other issue we see is there should be multiple RFCs broken out for that. Perhaps that means this is not an RFC, but an overall task that has RFCs linked to it.

I will bring up the process in the next TechCom meeting.

In T66214#4086547, @kchapman wrote:

The other issue we see is there should be multiple RFCs broken out for that. Perhaps that means this is not an RFC, but an overall task that has RFCs linked to it.

I would disagree, I don't think this RfC can be meaningfully broken up. There are a few ideas it mentions but does not actually propose to do (hash-based identification, content negotiation) and those could be removed for clarity, but that's all I can see.

But regardless, even if the task is not a real RfC, that's not a good reason to decline it as a Phabricator task. Probably the RfC project tag should just be removed in that case.

@Tgr we are just putting it in the Declined TechCom-RFC workboard, not in Phabricator as a whole. For reference, this is how we approach declining RFCs now: https://phabricator.wikimedia.org/T184653

TechCom is declining at this time but will be more than happy to discuss further RFCs on this topic in the future (noting that @Fjalapeno mentioned interest in picking this up).

Ugh, I am really sorry, I don't know how I could misread that so badly :( I guess my eye is trained to react to the word "Declined" in Phabricator emails, without reading them properly.

• Fjalapeno merged a task: T172221: Page summary API: Find a sane way to allow clients to select a page image thumb size.Apr 9 2018, 8:21 PM

• Fjalapeno added a project: Product-Infrastructure-Team-Backlog-Deprecated (Kanban).Apr 17 2018, 4:17 PM

Jdlrobson mentioned this in T192571: Remove need for client side thumbnail handling in Popups by using a thumb api.Apr 19 2018, 4:48 PM

daniel mentioned this in T106240: Colorable SVG.Apr 25 2018, 9:57 PM

• Jhernandez subscribed.Jun 12 2018, 9:32 AM

• Imarlier removed a project: Performance-Team (Radar).Jun 20 2018, 9:36 AM

Krinkle removed a parent task: T19577: Thumbnail urls should be versioned and sent with Expires headers.Jun 20 2018, 1:58 PM

• Mholloway edited projects, added Product-Infrastructure-Team-Backlog-Deprecated; removed Product-Infrastructure-Team-Backlog-Deprecated (Kanban).Jun 27 2018, 5:24 PM

@Jdrewniak shared a pointer to the https://cloudinary.com API which I thought could be a source of inspiration for this task. Be sure to check out their 57 second demo.

• Jhernandez moved this task from Needs triage to Epics on the Product-Infrastructure-Team-Backlog-Deprecated board.Jul 6 2018, 12:43 PM

• bearND mentioned this in T193275: PCS Media endpoint for Android.Aug 3 2018, 8:40 PM

• Imarlier awarded a token.Aug 13 2018, 5:46 PM

Tgr mentioned this in T206074: Wikimedia Technical Conference 2018 Session - Choosing the technologies to build our APIs.Oct 11 2018, 5:39 AM

• mobrovac added a project: Platform Team Legacy (Watching / External).Dec 20 2018, 12:02 PM

MaxSem removed a project: Zero.Jan 3 2019, 11:42 PM

Reedy removed subscribers: • dpatrick, • RobLa-WMF, • BGerstle-WMF, • Fabrice_Florin.Jan 3 2019, 11:43 PM

Krinkle closed subtask T120544: Package bloomd as Resolved.Jan 4 2019, 8:46 PM

Krinkle removed a parent task: T89971: ApiQueryImageInfo is crufty, needs rewrite.

Krinkle updated the task description. (Show Details)

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 7:50 PM

Anomie mentioned this in T223239: REST API Parameter Validation.May 24 2019, 6:13 PM

Anomie mentioned this in T226830: Add information about available scaled images to list=allimages.Jul 11 2019, 6:37 PM

EvanProdromou subscribed.Aug 6 2019, 6:29 PM

LGoto removed a project: Wikipedia-iOS-App-Backlog.Aug 27 2019, 8:03 PM

Aklapper mentioned this in T231454: Provide practical thumbnail URL.Sep 1 2019, 8:32 PM

simon04 merged a task: T231454: Provide practical thumbnail URL.Sep 1 2019, 9:46 PM

simon04 added subscribers: simon04, TheDJ, MartinK.

Abbe98 subscribed.Dec 30 2019, 4:58 PM

Krinkle removed a project: TechCom-RFC.Feb 19 2020, 11:08 PM

Krinkle added a project: TechCom-RFC (TechCom-RFC-Closed).

Krinkle moved this task from Untriaged to Declined on the TechCom-RFC (TechCom-RFC-Closed) board.

Anomie mentioned this in T245673: Reader gets page thumbnail with search results.Feb 24 2020, 6:58 PM

dr0ptp4kt moved this task from Infrastructure to Tracking on the Reading-Admin board.Mar 19 2020, 7:27 PM

• Jhernandez unsubscribed.Apr 2 2020, 6:46 PM

Aklapper added a parent task: T192571: Remove need for client side thumbnail handling in Popups by using a thumb api.May 14 2020, 11:23 AM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

phuedx unsubscribed.Jul 27 2020, 11:49 AM

AntiCompositeNumber subscribed.Aug 12 2020, 7:03 PM

Aklapper removed subscribers: Anomie, • Tbayer.Oct 16 2020, 5:02 PM

MBinder_WMF edited projects, added Parsoid (Tracking); removed Parsoid.Dec 10 2020, 8:12 PM

Paladox subscribed.Jan 23 2021, 7:35 PM

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!