Page MenuHomePhabricator

Thumbnail urls should be versioned and sent with Expires headers
Open, Stalled, MediumPublic

Description

Author: sergey.chernyshev

Description:
Image URL with timestamp patch

I'm optimizing performance of MediaWiki instances and one of the issues I came across is that images in MediaWiki don't change their URLs over time as they change so it's impossible to set far future expires headers for them to keep them firmly in browsers caches.

Here's the link explaining this particular issue: http://developer.yahoo.com/performance/rules.html#expires

You can see this issue in action here: http://performance.webpagetest.org:8080/result/090218_132826127ab7f254499631e3e688b24b/ (simple two-run test of http://en.wikipedia.org/wiki/Hilary_Clinton page) - notice that on repeat run all image requests are sent again even though images didn't change so we get 55 extra requests with 304 responses which requires 4 more connections to the commons server (see "Connection View" section below) all of which could be avoided. This might get even worse if we'll test consequent views with pages sharing only some images - in this case loading images after the ones that were already requested will be blocked.

I didn't try to calculate traffic savings (can be significant even though it's only headers that are being sent), but it can be done based on some statistics.

The good news is that MediaWiki already has control over the versioning of uploaded files (which is most important for images) so the solution would be to just make unique query string for each version of the image.

It looks like solutions for local file store and remote stores might be different, but I created a patch that relies on getTimestamp to be implemented accordingly in each subclass (LocalFile.php / ForeignAPIFile.php and so on).

Another, much "cleaner", approach would be to use file revision number instead of timestamp, but it'll require more knowledge of file store implementation which I lack. It might be heavier on CPU though as it'll require getting history from the database.

Anyway, I'm attaching a patch that already works for local file repository where timestamp implementation works fine.

You can see result of this patch here: http://performance.webpagetest.org:8080/result/090219_289bbf4e150b039459abe3ba3d3ce148/ (notice, that on second run only the page is requested).

If it all sounds right, I can apply this patch to the tree.

Sergey

URL: http://performance.webpagetest.org:8080/result/090218_132826127ab7f254499631e3e688b24b/
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=44310

Attached:

Details

Reference
bz17577

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:33 PM
bzimport set Reference to bz17577.
bzimport added a subscriber: Unknown Object (MLST).

sergey.chernyshev wrote:

Yep, patch doesn't include web server configuration for expiration headers.
Simple .htaccess like can be put into images/ folder (if Apache has AllowOverride Indexes for it):

ExpiresActive on
ExpiresDefault A25920000

Spiffy!

Offhand looks good, though would want to double-check there's no conflicts with remote repos and the on-demand thumbnailing.

Tim, can you take a peek at this today and see if there's any issues there? Thanks!

ayg wrote:

Does Squid currently get purged on image reupload? I suppose it must, to deal with image links that have no size specified. I had always assumed Squid is why we didn't change image URLs, but on reflection, it seems unlikely to be a big deal to do such purges occasionally.

If this works with Squid and file cache, it should probably be on by default. (Why doesn't file cache hook into Squid's purge mechanism?)

Will squids purge File:Foo.jpg?timestamp=19700101000000 entry when Foo.jpg is reuploaded?

Are pages using images on remote-repos correctly purged on image reupload?
(I think the problems of bug 1394 complicate it)
Infinite expiry images plus squids serving pages pointing to old images...

Does Squid currently get purged on image reupload?

Currently the plain page view URL does get purged from local Squids, however any *client* that has cached the image doesn't get any such notification. So, either the browser has to go back to hit the server every time it shows it to check if it's changed (slow!), or it speculatively caches it for some amount of time with the risk of showing an outdated version.

You can see this effect when you upload a new version of an image and see the old one sitting there on the File: page until you refresh.

Changing the URL with a timestamp would mean that any page which has been updated will use the updated URL, giving you the updated image version when you view it.

Are pages using images on remote-repos correctly purged on image reupload?

Nope, which is an issue to consider. There's not currently any registry of remote use, so the wiki doesn't know who to send purges to. (This would not be too hard to implement internally for DB-based repos so Commons could update the other Wikimedia sites, but would be much trickier for third-party sites using us via an API repo).

(In reply to comment #5)

Are pages using images on remote-repos correctly purged on image reupload?

Nope, which is an issue to consider. There's not currently any registry of
remote use, so the wiki doesn't know who to send purges to. (This would not be
too hard to implement internally for DB-based repos so Commons could update the
other Wikimedia sites, but would be much trickier for third-party sites using
us via an API repo).

Which unless we had a dedicated action=repo or similar, the API has no way of distinguishing between normal API requests and a request to act as a repo.

sergey.chernyshev wrote:

So what do we do with this? Can this patch be localized so that stores that can benefit from this could utilize this feature?

sergey.chernyshev wrote:

Not related to the solution but useful to measure future performance optimizations:
http://www.showslow.com/details/?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FHillary_Clinton

sergey.chernyshev wrote:

Just to illustrate results, here's the test for TechPresentations with this patch applied (it uses local repository): http://www.webpagetest.org/result/091210_3H74/

sergey.chernyshev wrote:

It's been 2 years since I provided initial patch, but Hillary Clinton still sends 304s for static assets: http://www.webpagetest.org/result/110225_EY_5e420956c8cf54450c47902cc4e82be0/1/details/cached/

You're loosing user experience and traffic.

ayg wrote:

This needs to be reviewed by someone who understands our Squid setup, like Tim or Brion. I don't think it needs a config option, it should just always be enabled, but we need to make sure the right Squid URLs are purged for it to work on Wikimedia. You're right that the status quo is unreasonable.

sergey.chernyshev wrote:

I think last time this was discussed there was another issue - that you guys have remote repository with static assets (uploads.wikimedia.org) while smaller MW installs can use local system to determine the version number.

In any case, it's worth implementing in both cases.

Sergey

No, it's not a problem for wikimedia since it is -for now- nfs mounted.

It is a problem for people using us as a remote repository.

sergey.chernyshev wrote:

Basically, there are a few ways to get versions:

  • from asset itself
    • ideally crc32/md5 of content (it's actually pretty fast)
    • or modification time (which is not very good)
  • from meta-data (in case of MW, it's file revision number)

Ideally, it should be part of the file name, e.g.
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e0/Hillary_Rodham_Clinton_Signature.svg/rev125/128px-Hillary_Rodham_Clinton_Signature.svg.png

Notice "rev125" between last two slashes. It can be a real folder if you prefer to keep old files or just pseudo-folder which can only be used for cache busting.

Cache busting is a must as we have assets infinitely stored in all possible caches, not only in SQuid. Don't know if you need to tell SQuid to clean old URLs or they'll be just LRU-ed later.

BTW, all this goes for skin files as well - it should probably be done differently though - as build script or post-commit hook or something like SVN-Assets tool that checks repository revision or hash of the file and generates file names accordingly.

Sergey

sumanah wrote:

Adding the need-review keyword to indicate that this patch still needs to be reviewed. Thanks for the patch and sorry for the wait, Sergey.

sumanah wrote:

Sergey, I'm sorry, but because so much time has passed since you submitted your patch, trunk has changed and your patch no longer applies cleanly. If the problem's still happening, would you mind updating it and then letting me know? I'll then get a reviewer for it. Thanks.

It doesn't make any difference to me whether or not the patch is updated. There's not a significant amount of code review here, it's mostly about the idea.

sergey.chernyshev wrote:

Glad you guys are on it - I don't think I can dig into the guts of MW again to get it working right, but Tim is correct, it's not much code, just simple stuff.

Still, if you can use real or pseudo-folders for file names, that would be even better (query strings might not be as good in terms of caches like your Squids and external caches too).

BTW, old way and new way can co-exists if there are worries about some instances not being able to support remote repos - all you need to do is set up infinite expires only on versioned URLs and keep regular URLs intact.

(In reply to comment #19)

It's been 3 years already, but Hillary is still very slow:
http://www.webpagetest.org/result/120302_AS_8bd3281d5ad4e661f4a2ad91d0a006b9/

Second run is not significantly more efficient then firs one:
http://www.webpagetest.org/video/compare.php?tests=120302_AS_8bd3281d5ad4e661f4a2ad91d0a006b9-r:1-c:0,120302_AS_8bd3281d5ad4e661f4a2ad91d0a006b9-r:1-c:1

Well second run being almost same speed is mostly due to counting ajax from the banner load. If you discount the ajax, it'd be roughly 6.2 seconds vs 5.5 seconds. If you don't count one image that took insanely long to return a 304 (which could just be a rare occurrence. Or it could be common place, I don't really know), the comparision becomes 6.2 seconds vs 3.6 seconds. Hence the speedup by fixing this bug might not be as much as that test would lead you to believe (It is still probably something that is fairy significant though, assuming it can be done effectively)

sergey.chernyshev wrote:

Actually, I'm looking at render times and not on load events.

Adding performance keyword, and removing Tim since he's not specifically looking at this. Aaron Schulz may have an idea or two about where we should go with this.

We really should look at this one again. If the WMF infra is so problematic, then perhaps we should wrap it in a conditional, so that at least it will improve functionality for 'the rest of them' ?

It could potentially fix the problem where people upload a new version of an image and some browsers don't purge the cached copy of the thumbnail by themselves. (We still need to tell some people to bypass their browser cache after 'upload new version' at times, even though I can't really see why a browser would 'not' send a request with the current setup. Perhaps some browsers try to be too smart if there is no Cache-Control:must-revalidate and set an hidden max-age ?).

sergey.chernyshev wrote:

Checking back 4.5 years later, are you guys still interested in saving traffic and increasing performance of web pages?

Any way I can help with his? Refreshing everybody's memory? Explaining the effect this can have on users and systems?

I'll be happy to do so - can even take a day or two of vacation to help.

(In reply to comment #24)

Checking back 4.5 years later, are you guys still interested in saving
traffic
and increasing performance of web pages?

Any way I can help with his? Refreshing everybody's memory? Explaining the
effect this can have on users and systems?

I'll be happy to do so - can even take a day or two of vacation to help.

Actually there has been recent interest in this sort of thing, but for different reasons (easier management of purging cache on server side. Obviously your reasons are good too)

sergey.chernyshev wrote:

Great, I'll be happy to see this implemented.

Krinkle renamed this task from Image urls should have far future expires to Thumbnail urls should be versioned and sent with Expires headers.Jun 2 2015, 11:28 PM
Krinkle removed a project: MediaWiki-Core-Team.
Krinkle set Security to None.
Krinkle removed a subscriber: Unknown Object (MLST).
Krinkle raised the priority of this task from Low to Medium.Sep 4 2015, 2:47 AM
Krinkle updated the task description. (Show Details)

@sergey.chernyshev Thank you for trying to drive this forward for so many years. I run MediaWiki for a side project (sarna.net) and not having versioned URLs for images has always bothered me. Just happened to stumble upon this issue finally! Versioned image URLs will let me offload the images to a CDN better (with a 2+year expiry).

Your patch more-or-less works with the current source. Here's a patched based off MW 1.28:

For anyone else following along at home, there are a few other issues I've found tracking something similar:

Unfortunately the first change doesn't affect all image URLs, as it only adds the SHA1 to thumbnail URLs. My site has a lot of pages where the images are small and are included directly, so I've actually turned on both supportsSha1URLs and this patch for &timestamp= so both image and thumbnail URLs are versioned. I'm currently testing this to make sure it will work when content changes.

It's also quite possible that the SHA1 could be added to regular image URLs (and not need &timestamp=), but I haven't tested that.

For what it's worth, I have a single-server setup that relies heavily on a CDN to offload assets and improve performance:

  • HTML pages are served from the server
  • JS and CSS via ResourceLoader (load.php) are served from the CDN. I've also hacked in a siterev=$wgCacheEpoch to all load.php URLs so they can all have a 2 year expiry (and I manually clear the file cache on any MW/skin/extension/JS/CSS changes)
  • Images and Thumbnails can now be served from the CDN once they have versioned URLs

So that is why versioned URLs really help me -- I can rely on the CDN to do the heavy lifting of JS, CSS and Images.

Anyways, hope this helps someone else out. There is a lot of good discussion in T66214/T149847, though the design is still being worked in. In the meantime, the above two changes are currently working (being tested) for my needs.

Imarlier changed the task status from Open to Stalled.Jun 20 2018, 9:08 AM

I use a custom-made extension that adds the file timestamp to the URLs (thumbnails and original file). If anyone is interested, the code is here and can be seen on wikidex.net

I use a custom-made extension that adds the file timestamp to the URLs (thumbnails and original file). If anyone is interested, the code is here and can be seen on wikidex.net

Ciencia, I'm extremely interested in trying your WikiDexFileRepository extension, but there isn't any installation and server configuration instructions. If you're available to help me get this implemented, I make a pull request for you that contains the instructions that worked for me. Got time to help?

...

I've updated the README.md of the repository with sample configuration

@Krinkle I gather by your comment on T149847 that you want this. Some questions:

  • Do you agree with having the timestamp in the query string? Or should it be in the path?
  • Should MediaWiki be able to serve thumbnails of old images this way, replacing the traditional /thumb/archive URLs?
  • What should this look like after T28741? Should we have the file revision ID in the path?

If some indicator of the file version is in the path, then we can support all the legacy modes for thumbnail generation and storage, as well as supporting CDNs.

Regarding the 2017 patch file by @NicJansma: there is a configuration system for file repositories. See e.g. FileRepo::__construct().

Regarding the extension by @Ciencia_Al_Poder: I note the "/latest" component in the path.

I think the big question is: what should happen when a reupload is done? I think the image on the page needs to be updated eventually, one way or another. So it seems to me that you can either purge the HTML, purge the thumbnails, or wait for one or the other object to expire from the CDN. Purging the HTML has the advantage of not completely breaking the page display if the aspect ratio changes. Maybe we can incrementally update the HTML in the parser cache, or maybe trigger a reparse of the wikitext if the aspect ratio or some other property of the image changes.

If you just change the URL in getThumbUrl() and send an Expires header, as seems to be the original proposal, then a reupload will not take effect until either the thumbnail expires or the parser cache is invalidated. Maybe that's OK, depending on your requirements. If you have a small site, you can set an expiry time of 1 day and probably get a decent cache hit ratio out of that, and then if there is a reupload, your readers get to see it after a day. It's cheap, but it works. I don't know if it's the right tradeoff for Wikimedia, where we control the CDN, and where fast updates in response to user edits are supposedly a point of pride.

I think the big question is: what should happen when a reupload is done? I think the image on the page needs to be updated eventually, one way or another. So it seems to me that you can either purge the HTML, purge the thumbnails, or wait for one or the other object to expire from the CDN. Purging the HTML has the advantage of not completely breaking the page display if the aspect ratio changes. Maybe we can incrementally update the HTML in the parser cache, or maybe trigger a reparse of the wikitext if the aspect ratio or some other property of the image changes.

If you just change the URL in getThumbUrl() and send an Expires header, as seems to be the original proposal, then a reupload will not take effect until either the thumbnail expires or the parser cache is invalidated. Maybe that's OK, depending on your requirements. If you have a small site, you can set an expiry time of 1 day and probably get a decent cache hit ratio out of that, and then if there is a reupload, your readers get to see it after a day. It's cheap, but it works. I don't know if it's the right tradeoff for Wikimedia, where we control the CDN, and where fast updates in response to user edits are supposedly a point of pride.

Currently, reuploads cause pages that embed the file to reparse and get the new URL (provided by my extension). At least, that's what I got from the job queue when a file used by other pages was reuploaded (and what our users experience since the last 3+ years we have with this setup)

1Apr 25 17:31:49 wikidex31 mwjobrunner[11129]: 2021-04-25 17:31:49 htmlCacheUpdate Archivo:EP042.png table=imagelinks recursive=1 rootJobIsSelf=1 rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152746,timestamp=20210425173141) STARTING
2Apr 25 17:31:49 wikidex31 mwjobrunner[11129]: 2021-04-25 17:31:49 htmlCacheUpdate Archivo:EP042.png table=imagelinks recursive=1 rootJobIsSelf=1 rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152746,timestamp=20210425173141) t=3 good
3Apr 25 17:31:59 wikidex31 mwjobrunner[11129]: 2021-04-25 17:31:59 htmlCacheUpdate Archivo:EP042.png recursive=1 table=imagelinks range={"start":160640,"end":false,"batchSize":5,"subranges":[[160640,false]]} division=1 rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152749,timestamp=20210425173149) STARTING
4Apr 25 17:31:59 wikidex31 mwjobrunner[11129]: 2021-04-25 17:31:59 htmlCacheUpdate Archivo:EP042.png recursive=1 table=imagelinks range={"start":160640,"end":false,"batchSize":5,"subranges":[[160640,false]]} division=1 rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152749,timestamp=20210425173149) t=122 good
5Apr 25 17:31:59 wikidex31 mwjobrunner[11129]: 2021-04-25 17:31:59 htmlCacheUpdate Archivo:EP042.png pages={"160640":[0,"Lista_de_episodios_de_la_serie_original"],"406631":[0,"Lista_de_episodios_de_la_serie_El_Comienzo"]} rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152750,timestamp=20210425173159) STARTING
6Apr 25 17:31:59 wikidex31 mwjobrunner[11129]: 2021-04-25 17:31:59 htmlCacheUpdate Archivo:EP042.png pages={"160640":[0,"Lista_de_episodios_de_la_serie_original"],"406631":[0,"Lista_de_episodios_de_la_serie_El_Comienzo"]} rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152750,timestamp=20210425173159) t=39 good
7Apr 25 17:32:01 wikidex31 mwjobrunner[11129]: 2021-04-25 17:32:01 htmlCacheUpdate Archivo:EP042.png pages={"10885":[0,"Lista_de_episodios_de_la_primera_temporada"],"20106":[0,"Lista_de_episodios_de_la_primera_temporada_por_fecha_de_emisi\u00f3n"],"32563":[0,"Lista_de_episodios_completa"],"77937":[0,"EP042"],"95070":[4,"Proyecto_Anime/Im\u00e1genes_del_anime/Primera_temporada"]} rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152748,timestamp=20210425173149) STARTING
8Apr 25 17:32:02 wikidex31 mwjobrunner[11129]: 2021-04-25 17:32:02 htmlCacheUpdate Archivo:EP042.png pages={"10885":[0,"Lista_de_episodios_de_la_primera_temporada"],"20106":[0,"Lista_de_episodios_de_la_primera_temporada_por_fecha_de_emisi\u00f3n"],"32563":[0,"Lista_de_episodios_completa"],"77937":[0,"EP042"],"95070":[4,"Proyecto_Anime/Im\u00e1genes_del_anime/Primera_temporada"]} rootJobSignature=2c3d600cd4f6b0cec8e8279bb93f86b07181dee4 rootJobTimestamp=20210425173141 causeAction=file-upload causeAgent=unknown namespace=6 title=EP042.png requestId=d1b6013eebe4c8496c2a0af9 (id=6152748,timestamp=20210425173149) t=47 good

Our images are served with a 30-day expires header, and we save a lot of bandwidth. All pages embedding the file use the new version near instantly.

On the WMF setup, where images are uploaded to Commons instead of the wiki where they're used, may have a big difference here. My guess is that a "Commons" setup doesn't invalidate the HTML cache of the pages embedding those images from other projects. That would be a big issue, that may result in a reupload/revert war on Commons, because users don't see the new version of the file on their pages. Triggering those update jobs on other WMF projects may be feasible here with some development, but not on external wikis that use the InstantCommons feature. However, I don't think that should be a problem for external sites, since they may not expect an image from Commons to update.

@Krinkle I gather by your comment on T149847 that you want this. Some questions:

  • Do you agree with having the timestamp in the query string? Or should it be in the path?

I'd lean toward path, unless that requires overcoming additional non-trivial challenges, in which case a query string might be an easier incremental step to take first. But, long term, I think paths are strongly preferred as they avoid many risks, ambiguities, or possible unstable expectations that complicate caching. I've found this the hard-way (T102578, and various related tickets, and our current handling of /static/current in docroots, and static.php rewriting/validating to avoid populating new URL caches with old contents).

These same downsides, however, also manifest naturally as one potential benefit, which is that it could more easily lead to fault tolerance. E.g. if our thumb paths remain magically "current" wth a query string merely to be used as indicator to enable cache headers, and to bypass said cache, then various "bad" scenarios would simply lead to whatever is the latest still being served and still degrading in some way (e.g. wrong dimensions perhaps).

But, if we consider them as concrete files instead and reference them from where they are stored, that would simplify things a lot long-term. It also means that we're not relying on the CDN layer as a way to serve old files during the time where an update is propagating, which for popular files could easily take hours through the job queue. If the query strings carry no functional meaning, and a "current" thumbnail wasn't in the CDN cache, then after an upload, all references to it would immediately start serving it, even if the JobQueue hasn't been update to make changes to the HTML yet. On the other hand, it could be seen as a performance optimisation if we cleverly allow reuploads that don't change dimentions to not have to reparse any pages. But.. maybe it's not worth it to make the system harder to reason about. It does, however, mean we need to make sure we have a strategy in place for "seeing your own changes". This could look the same as with templates, where we propagate bumps to page_touched right away, and let the job queue churn afterwards. Thus allowing the uploader to visit affected pages and observe the updates right away.

  • Should MediaWiki be able to serve thumbnails of old images this way, replacing the traditional /thumb/archive URLs?

(I forgot for a second that we actually already support thumbnails of old files. I suspect we probably purge the "current" ones, so they don't carry over, but indeed we do support thumbs for archived files already. And this is used built-in via the "File history" portion of file description pages.)

It seems natural to me that, once we place the file version in the file path, to not special-case "current" from "old" ones. Primary motivator being to preserve thumbs after a re-upload (to avoid having to immediately re-generate them again from traffic to old URLs during the cache propagation; or maybe we currently preserve them already by renaming them internally).

And while not applicable to thumbs, for originals it also seems preferrable long-term to allow filesystem transactions to not have to riskily "move" between current/old for the same reasons as the DB layer in T28741.

But.. I suppose there is some utility to having archive in the URLs. It serves as a stateless and decoupled interface to system administration tasks, e.g. purging non-current thumbnails without any assumptions or runtime interaction with MediaWiki. I suppose the highest timestamp or img_id is still a fairly simple interface, but implementation-wise it's less trivial than a simple wildcard match. This seems useful, and I think we've made active use of this at WMF in the past. It also seems like it would become more significant with this change. Because if we expose file IDs in the URL by default, then that will be the URL people copy around, so there's be a fair amount of background noise travelling to those older URLs more than e.g. currently which is likely limited to crawlers of the file description page requesting the small thumbs there, and nothing beyond that.

I don't know if this is significant enough to warrant keeping it, but at least it might be worth leaving it as-is and deal with that in a separate change - assuming it won't complicate the interim state much in terms of code complexity/risk?

  • What should this look like after T28741? Should we have the file revision ID in the path?

I think so, yes. Though this does bring up questions around transitioning, upgrading, and InstantCommons compatibility. If T28741 happens first, then that transition is limited to the (rarely used) archive thumbs, which we may be able to afford to handle in a less sophisticated way than for current-version thumbnails.

[…]
I think the big question is: what should happen when a reupload is done? I think the image on the page needs to be updated eventually, one way or another. So it seems to me that you can either purge the HTML, purge the thumbnails, or wait for one or the other object to expire from the CDN. Purging the HTML has the advantage of not completely breaking the page display if the aspect ratio changes. Maybe we can incrementally update the HTML in the parser cache, or maybe trigger a reparse of the wikitext if the aspect ratio or some other property of the image changes.

If the current thumbnail URLs are versioned, and stored as such on disk, and regeneratable as such (not solely relying on CDN cache for functionally satisfying old URLs). Then to not break page display, we wouldn't have to do anything. But yes, I think it would make sense to handle file reuploads the same as template edits. In order to satisy seeing your own changes, the current strategy for links update should work (leading with page_touched bumps and CDN purges, trailing with jobqueue warming up parser caches).

Skipping this for when the aspect ratio is identical would not work I think. We'd have to make the same (old versioned) thumbnail URLs resolve, after a purge, to newer thumbnail. And the actor in this case might experience problems with their browser cache if we told the browser previously that it could cache these for a long time. That's not the end of the world, but not requiring tech-savvy hard refreshes and such seems preferable. Over the past ten years I've helped with, or seen done, removing the need for such "clear/bypass your browser cache" instructions.

Also, I think even in the current system, queueing a links update would help address some bugs with regards to aspect ratio changes. So that might the first step to carve out from this ticket. (ref T109214, T279205). While these bugs may be short-lived at WMF given CDN churn of 1-4 days, I imagine third-party wikis with hotlinked InstantCommons are more affected by these since they would not get the new dimensions until their parser cache + http cache rolls over. By using versioned thumbnails, these third-party wikis would effectly change their behaviour back to how it was when InstantCommons used a non-zero apiThumbCacheExpiry by default. Good news: Less page breakage. Bad news: Less frequent updates.

My gut feeling is that for WMF the added JobQueue load and ParserCache churn would be minor in comparison to what we do with templates and Wikidata today.

Also, I've not touched at all on the cross-wiki nature of file repositories. If we go in this direction, that would make FileRepo the first service in MediaWiki core to support cross-wiki links updates. Something we've done a few times now in extensions (Wikidata, MassMessage, CentralAuth).