Page MenuHomePhabricator

Varnish won't purge thumbnails of specific file
Closed, DeclinedPublic

Event Timeline

As the nonupdateable thumbnails are copyvios, maybe this should be considered a security problem. But unsure.

This seems to be an issue with Varnish purging. Purging that file with debugging turned on, I can clearly see MediaWiki issuing the order to purge those files, including the problematic thumbnails that remain old no matter what: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.10.22/mediawiki?id=AWab6v-X00on8STvlYvw&_g=h@44136fa

This mechanism doesn't involve Thumbor at that stage. Thumbor would happily regenerate that file if given the chance, it's just that Varnish won't remove it from its cache.

In fact if you craft a URL that deliberately works around the Varnish cache, you can see that it does just fine: https://upload.wikimedia.org/wikipedia/commons/thumb/6/61/Pami%C4%99tniki_lekarzy_%281939%29.djvu/page13-1024px-doowapPami%C4%99tniki_lekarzy_%281939%29.djvu.jpg

Gilles renamed this task from Outdated Copyright violation thumbnails for djvu file on Commons to Varnish won't purge thumbnails of specific file.Oct 22 2018, 1:39 PM
Gilles assigned this task to BBlack.
Gilles edited projects, added Traffic; removed Thumbor, SRE-swift-storage.

Most likely, this is related to URI normalization rules (note %-encoded chars in the relevant titles) and/or the generation of purges at the origins (tracking known thumbnails for purging). Historically, AFAIK we've never found a case where Varnish actually refuses or fails to purge an object. It's all about it being asked to purge the wrong object, and/or there being multiple available path encodings for the same object.

Last I looked at the normalization stuff was a bunch of commits done near the bottom of T127387 back in February. There were still 4 outstanding patches at the time chained up near https://gerrit.wikimedia.org/r/c/operations/puppet/+/407643 . The reason the work halted is that I realized I still had some misunderstandings of what the canonical encoding rules actually were for our applications, and how they might differ between MW, RB, and/or File (Swift) storage, and that we really needed to go back to the drawing board to ensure it was understood properly, or risk causing more issues than we're solving.

ema triaged this task as Medium priority.Nov 1 2018, 10:02 AM

All the thumbnails are OK already and thumb.php seems to work for this file. So IMO this can be closed.

This is really bizarre. Second time it happens, and the previous affected file didn't seem to have special characters besides a dash (there could be more than one bug involved, though): https://upload.wikimedia.org/wikipedia/commons/thumb/d/d8/PL_Jean_de_La_Fontaine_-_Bajki.djvu/page657-1024px-PL_Jean_de_La_Fontaine_-_Bajki.djvu.jpg

Could it be that it "fixed itself", only by virtue of falling out of Varnish cache?

Let's close as declined as noone can reproduce anymore?

Sure, 'till next time ;)