Page MenuHomePhabricator

Thumbor should handle "temp" thumbnail requests
Closed, ResolvedPublic

Description

Nov 15 09:01:33 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/8/88/20161115090130%21fYJSjm.pdf/page1-71px-20161115090130%21fYJSjm.pdf.jpg
Nov 15 09:01:51 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/8/88/20161115090148%21ZpODuw.pdf/page1-71px-20161115090148%21ZpODuw.pdf.jpg
Nov 15 09:09:16 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/2/29/20161115090912%21jAh7y9.pdf/page1-71px-20161115090912%21jAh7y9.pdf.jpg
Nov 15 09:09:18 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/6/69/20161115090911%2141FZQ8.pdf/page1-71px-20161115090911%2141FZQ8.pdf.jpg
Nov 15 09:09:30 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/2/2b/20161115090924%21bZd8pY.pdf/page1-71px-20161115090924%21bZd8pY.pdf.jpg
Nov 15 09:11:11 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/e/e7/20161115091108%21j5pVeD.pdf/page1-71px-20161115091108%21j5pVeD.pdf.jpg
Nov 15 09:30:06 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/b/be/20161115092951%21chunkedupload_fccb7fca4a4f.jpg/66px-20161115092951%21chunkedupload_fccb7fca4a4f.jpg
Nov 15 09:45:02 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/3/37/20161115094450%21chunkedupload_e289fbf86657.jpg/66px-20161115094450%21chunkedupload_e289fbf86657.jpg
Nov 15 10:02:01 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/1/16/20161115100147%21chunkedupload_2efb66975b46.jpg/66px-20161115100147%21chunkedupload_2efb66975b46.jpg
Nov 15 10:58:45 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/temp/6/66/20161115105842%21bpBvCj.pdf/page1-77px-20161115105842%21bpBvCj.pdf.jpg
  • Reading those files need to be authenticated, so can't rely on the http loader like we currently do. Not sure if that means we'll have a dual approach or convert the existing mechanism to load originals with the swift client. I think the http loader is probably more reliable due to is simplicity, so it'll probably be an exception for the temp case.
  • The hash prefix has to be recalculated. For some stupid legacy reason the original and the thumbnail don't have the same hash prefix. The thumbnail md5s the whole filename, date included, while the original md5s on the filename minus date and exclamation point...

Revisions and Commits

rTHMBREXT Thumbor Plugins
Restricted Differential Revision

Event Timeline

Gilles added a revision: Restricted Differential Revision.Nov 28 2016, 2:06 PM

Looking at a recent example to figure out the proper paths:

http://upload.wikimedia.org/wikipedia/commons/thumb/temp/b/b7/20161128144034%21chunkedupload_2dc9bd04abac.jpg/84px-20161128144034%21chunkedupload_2dc9bd04abac.jpg

The original resides at:

header, response = swift.get_object('wikipedia-commons-local-temp.23', '2/23/20161128144034!chunkedupload_2dc9bd04abac.jpg')

So far I can't figure out where the thumbnail is stored with "swift list", though. I'll try reproducing some code with eval.php to find out.

According to code I ran in eval.php, the thumbnail should be stored at:

mwstore://local-multiwrite/local-temp/thumb/b/b7/20161128144034!chunkedupload_2dc9bd04abac.jpg/84px-20161128144034!chunkedupload_2dc9bd04abac.jpg

Confirmed:

>>> header, response = swift.get_object('wikipedia-commons-local-temp.b7', 'thumb/b/b7/20161128144034!chunkedupload_2dc9bd04abac.jpg/84px-20161128144034!chunkedupload_2dc9bd04abac.jpg')
>>> header
{'content-length': '3526', 'x-object-meta-sha1base36': 'm406jfjl4aket3mgu0jq31ye04wc2ym', 'content-disposition': "inline;filename*=UTF-8''20161128144034%21chunkedupload_2dc9bd04abac.jpg", 'accept-ranges': 'bytes', 'last-modified': 'Mon, 28 Nov 2016 14:41:02 GMT', 'etag': '0749c9972f02bd278be0fce8d0de1d3f', 'x-timestamp': '1480344061.99063', 'x-trans-id': 'tx30270967f8f646a9a268e-00583d488c', 'date': 'Tue, 29 Nov 2016 09:21:16 GMT', 'content-type': 'image/jpeg'}

Change 324919 had a related patch set uploaded (by Gilles):
Upgrade to 0.1.30

https://gerrit.wikimedia.org/r/324919

Change 324919 merged by Filippo Giunchedi:
Upgrade to 0.1.30

https://gerrit.wikimedia.org/r/324919

Change 330869 had a related patch set uploaded (by Gilles):
Switch Thumbor to swift loader

https://gerrit.wikimedia.org/r/330869

Change 330869 merged by Filippo Giunchedi:
Switch Thumbor to swift loader

https://gerrit.wikimedia.org/r/330869

The swift loader has a noisy error, I have to check if it's only legit 404s:

Jan 20 10:31:37 thumbor1002 thumbor@8817[37138]: 2017-01-20 10:31:37,852 8817 thumbor:ERROR [SWIFT_LOADER] get_object failed: ClientException('Object GET failed',)

Also, those 404s are not making it to the 404 log anymore. Possibly because the filtering for the 404 log was based on an error from the http(s) loader

Could be related, some metrics aren't being reported on Grafana anymore:

https://grafana.wikimedia.org/dashboard/db/thumbor

Change 333858 had a related patch set uploaded (by Gilles):
Switch Thumbor to swift loader

https://gerrit.wikimedia.org/r/333858

Change 333858 merged by jenkins-bot:
Switch Thumbor to swift loader

https://gerrit.wikimedia.org/r/333858

Fixes for the 404 log coming on a different task. I'm not seeing /temp 404s anymore in the swift logs.

Reopening this as we've seen what looks like a fd leak on thumbor for swift connections today. In addition to that constant outbound network traffic was observed on thumbor machines, though upon restart of thumbor such traffic went away.