Page MenuHomePhabricator

Thumbor 404s when the original has a ? in its filename
Closed, ResolvedPublic

Description

Nov 15 09:00:48 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/%E8%AA%AA%E5%A5%BD%E4%BA%86%E7%9A%84%E8%87%AA%E6%B2%BB%E5%91%A2%3F.jpg/192px-%E8%AA%AA%E5%A5%BD%E4%BA%86%E7%9A%84%E8%87%AA%E6%B2%BB%E5%91%A2%3F.jpg
Nov 15 10:30:01 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Fran%C3%A7ois_Boucher_-_Are_They_Thinking_About_the_Grape%3F_-_WGA02889.jpg/qlow-104px-Fran%C3%A7ois_Boucher_-_Are_They_Thinking_About_the_Grape%3F_-_WGA02889.jpg
Nov 15 11:18:50 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/Rhyme%3F_and_reason%3F_%281883%29_%2814593643027%29.jpg/133px-Rhyme%3F_and_reason%3F_%281883%29_%2814593643027%29.jpg
Nov 15 11:56:22 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Interieur,_overzicht_tijdens_restauratie_%28%3F%29_-_Rolduc_-_20357536_-_RCE.jpg/1024px-Interieur,_overzicht_tijdens_restauratie_%28%3F%29_-_Rolduc_-_20357536_-_RCE.jpg
Nov 15 12:05:31 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Krypt,_basement_van_het_westelijk_zuiltje_in_de_noord_conche_uit_het_oosten_met_restanten_lemen_vloer.%28%3F%29_-_Rolduc_-_20190341_-_RCE.jpg/1024px-Krypt,_basement_van_het_westelijk_zuiltje_in_de_noord_conche_uit_het_oosten_met_restanten_lemen_vloer.%28%3F%29_-_Rolduc_-_20190341_-_RCE.jpg

Example file page: https://commons.wikimedia.org/wiki/File:Krypt,_basement_van_het_westelijk_zuiltje_in_de_noord_conche_uit_het_oosten_met_restanten_lemen_vloer.%28%3F%29_-_Rolduc_-_20190341_-_RCE.jpg

Event Timeline

It might be two issues, some requests seem to have special characters in them, and others have /thumb/temp.

Ah, I guess temp goes to a different container?

Ah, reading thumb.php it seems like it's reading the original from the public zone, not the temp zone. Seems like they are different things. I noticed, though, that path hashing has to be corrected for temp, which might explain the issue encountered here. I.e. the temp original and the temp thumbnail have different hashing values (one computes it with the date, the other doesn't).

I'm going to double check, but it would suggest that Thumbor has to compensate for the wrong hash values being passed in the URL and recompute the correct name hash value where the original will be found...

It seems to be the temp container after all. This code is confusing as hell... So it would seem that not only the hash correction is needed, Thumbor also needs to be authed into swift to fetch the temp original. I'll write a bit of Python to see if the Thumbor Swift user can access that file...

@fgiunchedi attempting to get an object from the temp container with the mw:thumbor credentials gives me a 403:

swiftclient.exceptions.ClientException: Object GET failed: http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-temp.52/5/52/20161121140233%21Ft3W43.pdf 403 Forbidden  [first 60 chars of response] <html><h1>Forbidden</h1><p>Access was denied to this resource

@Gilles indeed I've granted mw:thumbor user access only to thumb containers, looks like we'll need to do the same for temp containers too. I'll take this while I'm doing the access.

Mentioned in SAL (#wikimedia-operations) [2016-11-22T19:48:30Z] <godog> set thumbor access for temp containers - T150760

The perms should be fixed everywhere now @Gilles

Now that I've separated the temp case into its own task, I see that the remaining ones look like an encoding problem. I notice one character in particular that all these URLs have in common, %3F, which is hex for "?". Makes sense as a problematic character.

Gilles renamed this task from Thumbor 404s on a number of images Mediawiki is successful with to Thumbor 404s when the original has a ? in its filename.Nov 23 2016, 10:28 AM
Gilles updated the task description. (Show Details)

Change 324919 had a related patch set uploaded (by Gilles):
Upgrade to 0.1.30

https://gerrit.wikimedia.org/r/324919

Change 324919 merged by Filippo Giunchedi:
Upgrade to 0.1.30

https://gerrit.wikimedia.org/r/324919