Page MenuHomePhabricator

JPG thumbnail chaining leads to over sharpened thumbnails
Closed, ResolvedPublic

Description

Some images, especially those with tiny structures like foliage, suffer from the newly enabled JPG thumbnail chaining (T69525). Would it be possible to create the intermediate images without sharpening and always serve a thumbnail that was made from one of the intermediate pictures instead of sending these intermediate ones directly?

Event Timeline

Rillke raised the priority of this task from to Low.
Rillke updated the task description. (Show Details)
Rillke changed Security from none to None.
Rillke subscribed.

It's true that small images might be more sharpened than before, but it's very difficult to reproduce the old sharpening value, because now multiple passes happen whereas in the past it was just one pass.

We can tweak the settings a little, but keep in mind that:

  • There's always been sharpening applied, and it always caused complaints about specific images. Applying sharpening is a balancing act and what matters is what works for most people and most images. Which means that some specific images will always be left out and suffer from this treatment which improves the perceived quality of thumbnails for the majority of images.
  • The current settings were tested on images with thin lines/details during a small blind A/B test over a set of volunteers and while some people with a keen eye were able to tell the lower quality apart, most people couldn't. By tweaking the settings, we risk making something that is less universal and we'll probably get complaints about thumbnails being too soft

Before turning this into something actionable, have the current thumbnails been compared to what mediawiki would generate before thumbnail chaining was enabled? I see screenshots of desktop software, which to me seems like the wrong comparison to make regarding chaining. It becomes a question about whether sharpening should be applied at all or not, which is independent from chaining and should be a ticket of its own. If people compare to what mediawiki generates without chaining turned on, they might be surprised that it's still too sharpened for their taste.

Before turning this into something actionable, have the current thumbnails been compared to what mediawiki would generate before thumbnail chaining was enabled?

I don't think so (because it will be hard to do so for those who have brought it up unless provided with a wiki they can test it on). Looking at this page, I immediately notice that thumbnails of different sizes are differently sharpened: First is very sharp, second rather blurry, fourth sharp again, ...

Addressing the sharpening issue in general:
Some people will likely kill me for proposing this but what about a |sharpening= parameter for the [[File:]] inclusion syntax (preferred as the effect also depends on the size) or a setting configurable at the file description page of an image? This would be truly a different issue/task but it could address the issue that some images will always look worse due to this setting than others. On the other hand, a lot more thumbs would have to be cached and file handling would get even more complex.

Addressing the issue of differently sharpened thumbnails:

  • Have an option to turn chaining off for an image or thumbnail
  • Create dedicated chaining thumbnails with less sharpening

@Gilles, the chaining algorithm should know if it is using the original file or a larger thumb to generate the current thumbnail. You could easily choose a different (lower ) sharpening factor if you are rescaling a previously rescaled (and sharpened) thumb.

Edit: Oh, Rillke suggested pretty much this already...

Would it help if we provided the exact sequence of imagemagick commands used by MediaWiki so editors could experiment? ImageMagick is fairly easy to use and to install on both Linux and Windows, and making live changes with a two-week review cycle is not a efficient way of figuring out the best settings...

Some people will likely kill me for proposing this but what about a |sharpening= parameter for

We don't have the disk capacity to add another parametrized variant for thumbnails. Thumbnails are still stored in swift with redundant copies, which makes the issue worse. It's simply too costly to offer that kind of option at this time. I mean costly in terms of actual hardware and dollars.

As pointed out earlier, ImageMagick always sharpened while desktop software generally doesn't for mere resizing. That's not something I introduced and I would personally be completely fine with never sharpening, because IMHO it's destructive and just a visual trick. But clearly sharpening was originally introduced because some people thought that it helped with the perceived quality of the majority of thumbnails. In that context, I wanted to limit how many people would be upset by the quality change and I tried to stay close to the original sharpening, but it was inevitably going to be slightly different.

It would be more meaningful to compare this tree image with the same thumbnail that would be generated by stock MediaWiki. I know that the chained ones will still look slightly worse, but the difference will probably be more subtle.

@Gilles, the chaining algorithm should know if it is using the original file or a larger thumb to generate the current thumbnail. You could easily choose a different (lower ) sharpening factor if you are rescaling a previously rescaled (and sharpened) thumb.

This is the naive approach and certainly the first to try, but there is nothing obvious about the consensus of perceived sharpness and quality. When I ran my small survey I was surprised to see that more people actually preferred the chained thumbnails (with its known extra sharpening), despite the clear quality loss from a theoretical perspective. I didn't expect that at all. Like I said earlier, I saw that a couple of people with a keen eye spotted the chained ones and systematically voted them down, but they were the minority by far.

A new solution has to be put in front of a lot of people to establish that it's a superior choice. Trivial code change, but very involved user testing to make sure that we're not making things worse for the majority. I'm all for doing that properly, testing different variants within an automated framework where Commons users could go to a page and pick which thumbnail is the best out of 3 kind of thing, over a large set of random images. Then it becomes much easier to iterate and just try ideas out. It's a much larger project, but then it's something that can be reused in the future when we want to change what we use to render thumbnails, etc.

Regarding the proposed change of applying it less for smaller sizes down in the chain, the opposite approach might actually work better. I.e. don't sharpen when going big image -> big image but only when going medium image -> small thumb. Because it's my understanding that the sharpening is mostly there to maintain perceived lines when they usually get lost in small sizes. Anyway, it would be nice to be able to try these two ideas and more.

In T76983#832239, @Tgr wrote:

Would it help if we provided the exact sequence of imagemagick commands used by MediaWiki so editors could experiment? ImageMagick is fairly easy to use and to install on both Linux and Windows, and making live changes with a two-week review cycle is not a efficient way of figuring out the best settings...

Sadly it's not just a single command, the sharpening depends on the distance in terms of size between the source image and the requested image to generate. The "recipe" comes with ifs and thens defining in what cases the sharpening should be applied. Add chaining to the mix, which could have its own rules like suggested above and it becomes complicated to express and very tedious to test manually. I don't think that people who want to contribute to finding a better recipe can do so without programming. If they focus on a single command for a limited set of test images, they likely won't provide something that we can actually turn into code, because it will work for those images and those sizes, but it won't tell us what the general thresholds and rules should be.

When I ran my small survey I was surprised to see that more people actually preferred the chained thumbnails (with its known extra sharpening), despite the clear quality loss from a theoretical perspective. I didn't expect that at all. Like I said earlier, I saw that a couple of people with a keen eye spotted the chained ones and systematically voted them down, but they were the minority by far.

I'm afraid that the results from you survey might just be invalid (maybe this is causing all the fuss):
When I looked into the issue I remembered actually having taken this survey myself some time ago. One had to compare sharpness/quality on a scale from 1 to 10. Now if I remember correctly, I rated the sharpness according to the actual sharpness in the image (1 being too soft an 10 being helplessly over-sharpened). Today I assume (or at least could imagine) that 1 was meant to be poor sharpening and 10 was meant to be good sharpening (whatever "poor" and "good" means in this context).
If more people fell for this ambiguity it's possible that the results you got are a mixed version of those two groups who just understood the questions differently...

When I ran my small survey I was surprised to see that more people actually preferred the chained thumbnails (with its known extra sharpening), despite the clear quality loss from a theoretical perspective. I didn't expect that at all. Like I said earlier, I saw that a couple of people with a keen eye spotted the chained ones and systematically voted them down, but they were the minority by far.

I'm afraid that the results from you survey might just be invalid (maybe this is causing all the fuss):
When I looked into the issue I remembered actually having taken this survey myself some time ago. One had to compare sharpness/quality on a scale from 1 to 10. Now if I remember correctly, I rated the sharpness according to the actual sharpness in the image (1 being too soft an 10 being helplessly over-sharpened). Today I assume (or at least could imagine) that 1 was meant to be poor sharpening and 10 was meant to be good sharpening (whatever "poor" and "good" means in this context).
If more people fell for this ambiguity it's possible that the results you got are a mixed version of those two groups who just understood the questions differently...

I was also asking to rate general quality, not just sharpness. Regardless of how people interpreted the question about sharpness, the one about quality was unambiguous and that's the part I was referring to.

I was also asking to rate general quality, not just sharpness. Regardless of how people interpreted the question about sharpness, the one about quality was unambiguous and that's the part I was referring to.

Yeah, sorry, but I'm quite sure this didn't do anything to solve the ambiguity:
Since you were specifically asking for sharpness in the first question (and didn't give any details on if there where other differences), I tried to rate overall quality without giving too much weight to sharpness (otherwise there could have been only a single scale to start with, so I assumed you were aiming for compression artifacts as result of sharpening with this second question or something like that).
I'm quite sure I wasn't the only participant with this thought, so the results to the quality question are not a representative measure for the quality of the sharpening at all.

@Gilles:

The results from you survey looks invalid. There is a quality loos like day/night. Example: https://commons.wikimedia.org/wiki/File:Commons_gegen_irfanview_gegen_gimp,_jweils_das_Original_auf_800px_verkleinert.jpg and https://commons.wikimedia.org/wiki/File:%C5%A0%C3%A1%C5%A1ovsk%C3%BD_hrad_%28by_Pudelek%29_1.JPG
Sharpness is very important. If thumbs are bad, uploaders are unhappy and stopping uploading files.

One had to compare sharpness/quality on a scale from 1 to 10. Now if I remember correctly, ...

The survey is still up, you can view it at surveymonkey.

As pointed out earlier, ImageMagick always sharpened while desktop software generally doesn't for mere resizing. That's not something I introduced and I would personally be completely fine with never sharpening, because IMHO it's destructive and just a visual trick. But clearly sharpening was originally introduced because some people thought that it helped with the perceived quality of the majority of thumbnails. In that context, I wanted to limit how many people would be upset by the quality change and I tried to stay close to the original sharpening, but it was inevitably going to be slightly different.

Sharpening has been introduced in 2006/2007: https://phabricator.wikimedia.org/T8193. We then had long discussions, most links can be found in that bugzilla report.

Of course sharpening is allways destructive, but resizing images is allways destructive, but leads to a more convieniant view. But ususally sharpening is used only once, most as the last step in image processing. Stacking/Chaining of more than one sharpening step is a beginner's fault.

@Gilles, the chaining algorithm should know if it is using the original file or a larger thumb to generate the current thumbnail. You could easily choose a different (lower ) sharpening factor if you are rescaling a previously rescaled (and sharpened) thumb.

This is the naive approach and certainly the first to try, but there is nothing obvious about the consensus of perceived sharpness and quality. When I ran my small survey I was surprised to see that more people actually preferred the chained thumbnails (with its known extra sharpening), despite the clear quality loss from a theoretical perspective. I didn't expect that at all. Like I said earlier, I saw that a couple of people with a keen eye spotted the chained ones and systematically voted them down, but they were the minority by far.

Maybe the survey was misunderstood. I tried to do it and resigned quickly, because I was uncertain how judging was meant.

Regarding the proposed change of applying it less for smaller sizes down in the chain, the opposite approach might actually work better. I.e. don't sharpen when going big image -> big image but only when going medium image -> small thumb. Because it's my understanding that the sharpening is mostly there to maintain perceived lines when they usually get lost in small sizes. Anyway, it would be nice to be able to try these two ideas and more.

Agree. We then tried/tested several settings of convert resp. imagemagick. I don't know, which setting was implemented and now working fine for years. But it is clear, this cannot be chained without producing massive artifacts in the images.

In T76983#832239, @Tgr wrote:

Would it help if we provided the exact sequence of imagemagick commands used by MediaWiki so editors could experiment? ImageMagick is fairly easy to use and to install on both Linux and Windows, and making live changes with a two-week review cycle is not a efficient way of figuring out the best settings...

Sadly it's not just a single command, the sharpening depends on the distance in terms of size between the source image and the requested image to generate. The "recipe" comes with ifs and thens defining in what cases the sharpening should be applied. Add chaining to the mix, which could have its own rules like suggested above and it becomes complicated to express and very tedious to test manually. I don't think that people who want to contribute to finding a better recipe can do so without programming.

Can be done by command line in a loop. Yes, this is much work, but imho not necessary, because most of it has been done already in 2007 ;-)

If they focus on a single command for a limited set of test images, they likely won't provide something that we can actually turn into code, because it will work for those images and those sizes, but it won't tell us what the general thresholds and rules should be.

Agree. I'd suggest to apply sharpening only if the resulting images are smaller than e.g. 480px at the long edge. Leave all bigger scalings unchanged and use only the images with 480px as source for further downscaling. Exact size needs to be tested, maybe 640 or 400 is better. Caching images of that size, and smaller, should not be a big problem in matters of disk space, and processing time should also be less compared to working with originals with 16, 36 or even more MPIxels.

Sharpening has been introduced in 2006/2007: https://phabricator.wikimedia.org/T8193. We then had long discussions, most links can be found in that bugzilla report.

Thanks for the link, it's very useful, but the discussion seems to have happened between a small amount of users, all German-speaking, with no blind testing and over a small amount of test images. Times were different back then, but it's difficult to point to that history and consider it like enough justification for that feature based on our current standards. No sharpening at all should be added to the list of variants to blind-test against.

It would be more meaningful to compare this tree image with the same thumbnail that would be generated by stock MediaWiki. I know that the chained ones will still look slightly worse, but the difference will probably be more subtle.

At Commons:Forum User Tors has posted such a comparison: stock mediawiki on the left, Commons on the right (of course you'll need to view the image at full size or it'll have another round of sharpening applied ;-). Seems to be standard Commons filepage preview size, so probably somewhere in the middle of the bucket chain? Have a look at the roof, the edges of the stones in the wall or the window shutters … that's seriously oversharpened, imho.

I'll try only sharpening the last step when I have some time, since it seems like the reasonable thing to do as a stopgap measure and might get us closer to the old results. But let's not bike shed over this, I'd rather work on making a true blind-testing environment to get to the bottom of what's universally perceived as best, rather than participating in endless arguments about what value X or Y should be, without any data to back any version.

For now trying to emulate something closer to what stock mediawiki gives is worthwhile, but expectations should be realistic: it won't be identical.

Sounds like a plan.

But I've got a feeling that people who are not used to judge the quality of sharpening might simply prefer the one they percieve as sharper because they don't know the signs of oversharpening. At least I'm not sure whether I would have preferred the left or the right one in the above example two years ago …

So instead of giving people two versions of an image and asking them which they like better, what about the following procedure (just brainstorming here): Give them a series of maybe 10 or 15 versions with a steadily increasing amount of sharpening applied and ask them to identify the first one they percieve as "sharp enough" as well as the first one with "too much sharpening".

Repeat with several different base images, of course. Maybe even shift the scale by starting with an artificially blurred or already sharpended image on position 1 every once in a while, so that the unsharpened original may be somewhere in the middle of the series or not in it at all. That way the person participating in the survey wouldn't know which one is the original.

Possibly repeat that for different thumb sizes.

Sound like a fair plan to determine which amount is best, but then the question is a little skewed because it implies that sharpening is needed, when other large websites where image quality is equally critical don't sharpen their thumbnails at all. Plus, it's only one part of the equation, after all sharpening isn't that isolated, it's already combined with the choice of a specific resizing algorithm. Maybe sharpening is useful for some algorithms but not others, etc. I'd see that as a second step, if we pick a recipe that includes sharpening in some cases, to tweak that parameter further.

Sharpening has been introduced in 2006/2007: https://phabricator.wikimedia.org/T8193. We then had long discussions, most links can be found in that bugzilla report.

Thanks for the link, it's very useful, but the discussion seems to have happened between a small amount of users, all German-speaking, with no blind testing and over a small amount of test images.

The amount of test images was much bigger, but many hostet externally, those external pages have gone since then.

Times were different back then,

As described we used convert with several different settings. Has convert/imagemagick changed so much?

No sharpening at all should be added to the list of variants to blind-test against.

The remaining gallery page does of course not make sense anymore. It was built when medawiki did not apply sharpening, so the examples could be compared within wikipedia then.

But it is obvisionally nonsense to do a blind test. It is common sense that people, if they have the choice, in every case prefer colourful, high contrast, sharp-looking images. In a blind test you will get in any case results, which tend to the overprocessed versions. Believe me, I'm processing photos since more than 40 years, and this was ever since true with chemical photography.

But for wikipedia purposes we must not search for the versions with the biggest wow, we need to look for the closest possible representation of the original image. The current setting applies crispyness to the images, that does not really exist. Regardless of the number and language of the judges in our approach at that time, we tried to find a setting with as low impact as possible. It seems to me, the result has been widely accepted a few years.

Your statement is contradictory, you dismiss destructive post-processing as something people will lean towards because it artificially makes the images "look better", yet you defend a specific destructive post-processing step. If the goal is indeed to stay as close to the original as possible for encyclopedic and fidelity purposes, no sharpening should be applied. Why should sharpening get a pass and something like a slight saturation increase be a big no-no?

Furthermore, the interesting argument you make about voting over this sort of thing being meaningless because people will always lean towards the processed version can be applied against the voting you had organized about sharpening.

Sadly it's not just a single command, the sharpening depends on the distance in terms of size between the source image and the requested image to generate. The "recipe" comes with ifs and thens defining in what cases the sharpening should be applied. Add chaining to the mix, which could have its own rules like suggested above and it becomes complicated to express and very tedious to test manually. I don't think that people who want to contribute to finding a better recipe can do so without programming. If they focus on a single command for a limited set of test images, they likely won't provide something that we can actually turn into code, because it will work for those images and those sizes, but it won't tell us what the general thresholds and rules should be.

Then disable chaining for now, and create a small tool on labs which can do left/right comparison between two different processing chains? Could be done in a fairly generic way to allow testing of other transformation as well. Maybe something along the lines of Quarry, with imagemagick processing steps instead of SQL commands; people could choose two processing methods and any Commons image and make comparisions.

I also wonder how it would influence the speed/quality tradeoff if transformations would not be chained (ie. intermediary thumbnails would always be created from the original file).

Either way, making the final step different from the intermediary one seems like a complex change - either a new zone or a new thumbnail parameter would be needed.

Addressing the sharpening issue in general:
Some people will likely kill me for proposing this but what about a |sharpening= parameter for the [[File:]] inclusion syntax (preferred as the effect also depends on the size) or a setting configurable at the file description page of an image? This would be truly a different issue/task but it could address the issue that some images will always look worse due to this setting than others. On the other hand, a lot more thumbs would have to be cached and file handling would get even more complex.

IMO that would be a mistake from both an ops and a usability perspective: we should move into a direction where editors can set the semantics and the software figures out how to do it best in a given environment, not towards editors specifying what should be happening, and then that breaking down horribly in a different environment (such as mobile). Per-file processing hints (via Wikibase, once it's in place) might make sense, but per-page, definitely not...

Your statement is contradictory, you dismiss destructive post-processing as something people will lean towards because it artificially makes the images "look better", yet you defend a specific destructive post-processing step. If the goal is indeed to stay as close to the original as possible for encyclopedic and fidelity purposes, no sharpening should be applied. Why should sharpening get a pass and something like a slight saturation increase be a big no-no?

If you do not understand the difference between careful enhancements and massive manipulations further discussion makes no sense.

Furthermore, the interesting argument you make about voting over this sort of thing being meaningless because people will always lean towards the processed version can be applied against the voting you had organized about sharpening.

There is a difference between votings of authors and votings of customers.

If you do not understand the difference

@Smial: Please refrain from such postings and read the etiquette first. Stay technical instead of personal.

I'm leaning towards disabling sharpening altogether for the following reasons:

  • Other file formats than JPG aren't being sharpened at the moment. It doesn't makes sense to me that a thumbnail based on a PNG wouldn't get sharpened while one based on a JPG does. Especially considering that JPG is lossy, we're likely to be sharpening compression artifacts and making them more pronounced.
  • I've heard the occasional complaint about the old values giving poor results on specific images already (diagrams and the like). Sure, chaining makes it worse, but the old way didn't please everyone either
  • Sharpening is a destructive operation
  • We should strive to generate "derivative" images as close to the original as possible

Thoughts?

I'm leaning towards disabling sharpening altogether for the following reasons:

  • Other file formats than JPG aren't being sharpened at the moment. It doesn't makes sense to me that a thumbnail based on a PNG wouldn't get sharpened while one based on a JPG does. Especially considering that JPG is lossy, we're likely to be sharpening compression artifacts and making them more pronounced.

Other formats are usually used for graphics, not for photos.

  • I've heard the occasional complaint about the old values giving poor results on specific images already (diagrams and the like). Sure, chaining makes it worse, but the old way didn't please everyone either

Ditto. Diagrams shouldn't be in JPEG format. Apart from that, we probably want to recognize them at some point as diagrams anyway? Google has this recognition for years.

  • Sharpening is a destructive operation

Okay. But do you know what Flickr does? Looking at my uploads there, I often have the feeling they are doing the right thing. And they appear to apply sharpening (especially if the thumbnail is very small compared to the original)

  • We should strive to generate "derivative" images as close to the original as possible

We should strive for what the readers like to see. Again, Flickr has solved this (to my preference). If thumbnails are going to become too blurry, we won't gain anything.

Thoughts?

Okay, as interim solution, sharpening should be possibly shut off. If people start uploading their photos in PNG format, something is rotten.

Before I roll this back, I'd like to measure what the performance impact of chaining has been. This will help us assess whether it's worth fixing or if it didn't bring expected performance gains. Give me a few more days to gather enough data and I'll disable chaining by the end of the week regardless of the results. If I disable it now without gathering stats, we won't be able to know if it's worth perfecting.

Change 180138 had a related patch set uploaded (by Gilles):
Disable thumbnail chaining

https://gerrit.wikimedia.org/r/180138

Patch-For-Review

Before I roll this back, I'd like to measure what the performance impact of chaining has been.

Would be interested in the results :) Could you measure an improvement? If not, would it have been improved over time (e.g. slow down due to missing buckets)?

IMO that would be a mistake from both an ops and a usability perspective: we should move into a direction where editors can set the semantics and the software figures out how to do it best in a given environment

+1, no more image markup settings PLEASE.

Change 180138 merged by jenkins-bot:
Disable thumbnail chaining

https://gerrit.wikimedia.org/r/180138

Gilles reopened this task as Open.
Gilles moved this task from Needs code review to Done on the Multimedia board.
Gilles moved this task from Done to Doing on the Multimedia board.

Would be interested in the results :) Could you measure an improvement? If not, would it have been improved over time (e.g. slow down due to missing buckets)?

Results are available: https://lists.wikimedia.org/pipermail/multimedia/2015-January/000989.html