Page MenuHomePhabricator

outdated DjVu file page thumbnail in cache
Open, MediumPublic

Description

The thumbnail:

https://upload.wikimedia.org/wikipedia/commons/thumb/e/ee/Karol_May_-_Old_Surehand_01.djvu/page256-854px-Karol_May_-_Old_Surehand_01.djvu.jpg

needs to be regenerated. It is from outdated version of the file:

HTTP/1.1 200 OK
Date: Wed, 31 Jan 2018 20:32:10 GMT
Content-Type: image/jpeg
Content-Length: 156630
Connection: keep-alive
X-Object-Meta-Sha1Base36: feuze8dsgid99y4ti1gv3wbvld6xxox
Content-Disposition: inline;filename*=UTF-8''Karol_May_-_Old_Surehand_01.djvu.jpg
Last-Modified: Fri, 08 Jan 2016 07:47:03 GMT
Etag: a81b9d5936394d1d2a1a7341c53058c9
X-Timestamp: 1452239222.85447
X-Trans-Id: txd5b25d42412b41cab95b1-005a721e98
X-Varnish: 499281636, 338202450 336658680, 921780112
Via: 1.1 varnish-v4, 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Age: 2354
X-Cache: cp1048 pass, cp3034 hit/10, cp3035 miss
X-Cache-Status: hit-local
Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
X-Analytics: https=1;nocookies=1

while the current file version is dated 2016-08-23T23:28:43

Event Timeline

Aklapper renamed this task from oudated DjVu file page thumbnail in cache to outdated DjVu file page thumbnail in cache.Feb 1 2018, 11:41 AM

I get the same identifying headers when I query that URL directly from Swift storage internally, indicating this is not an edge caching issue:

curl -I http://ms-fe.svc.eqiad.wmnet/wikipedia/commons/thumb/e/ee/Karol_May_-_Old_Surehand_01.djvu/page256-854px-Karol_May_-_Old_Surehand_01.djvu.jpg
[...]
Content-Length: 156630
X-Object-Meta-Sha1Base36: feuze8dsgid99y4ti1gv3wbvld6xxox
Last-Modified: Fri, 08 Jan 2016 07:47:03 GMT
Etag: a81b9d5936394d1d2a1a7341c53058c9
X-Timestamp: 1452239222.85447

I suspect this has more to do with (possibly old) thumbnailing or djvu-handling issues than caching?

I suspect this has more to do with (possibly old) thumbnailing or djvu-handling issues than caching?

Newly generated thumbnails of other sizes are generated correctly, eg.

https://upload.wikimedia.org/wikipedia/commons/thumb/e/ee/Karol_May_-_Old_Surehand_01.djvu/page256-855px-Karol_May_-_Old_Surehand_01.djvu.jpg

is showing page #232 (and not the incorrect page #230 - the 256th page of the older DjVu file version)

Well, yes, semantics :)

It is a "caching" problem in some general sense of the word, but in terms of pointing fingers at different parts of our infrastructure and code, it doesn't look like an edge caching problem (the outer-most HTTP caches managed by the Traffic team). The generation of thumbnails and storing (caching?) of them into Swift is a whole other layer of issue.

Indeed looks like not all thumbs for all pages have been purged and some are still in swift for the older version. @Ankry does purging the file page manually do the trick for you?

Indeed looks like not all thumbs for all pages have been purged and some are still in swift for the older version. @Ankry does purging the file page manually do the trick for you?

I tried to do it manually 3-4 times, with no effect.

Dzahn triaged this task as Medium priority.Feb 1 2018, 11:32 PM

As resolving this problem takes over a month and the incorect thumbnail disorganizes volunteers work in Polish Wikisource concerning the "Old Surehand" book, I uploaded a local copy of the DjVu file.

The appropriate plwikisource thumbnail (which overrides the one from Commons) is correct:
https://upload.wikimedia.org/wikisource/pl/thumb/e/ee/Karol_May_-_Old_Surehand_01.djvu/page256-854px-Karol_May_-_Old_Surehand_01.djvu.jpg

The copy may be removed when this problem is resolved.

What we know at the moment? What component is responsible for this? What was debugged? Any releated Tasks?