Nov 15 09:00:48 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/%E8%AA%AA%E5%A5%BD%E4%BA%86%E7%9A%84%E8%87%AA%E6%B2%BB%E5%91%A2%3F.jpg/192px-%E8%AA%AA%E5%A5%BD%E4%BA%86%E7%9A%84%E8%87%AA%E6%B2%BB%E5%91%A2%3F.jpg Nov 15 10:30:01 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Fran%C3%A7ois_Boucher_-_Are_They_Thinking_About_the_Grape%3F_-_WGA02889.jpg/qlow-104px-Fran%C3%A7ois_Boucher_-_Are_They_Thinking_About_the_Grape%3F_-_WGA02889.jpg Nov 15 11:18:50 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/Rhyme%3F_and_reason%3F_%281883%29_%2814593643027%29.jpg/133px-Rhyme%3F_and_reason%3F_%281883%29_%2814593643027%29.jpg Nov 15 11:56:22 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Interieur,_overzicht_tijdens_restauratie_%28%3F%29_-_Rolduc_-_20357536_-_RCE.jpg/1024px-Interieur,_overzicht_tijdens_restauratie_%28%3F%29_-_Rolduc_-_20357536_-_RCE.jpg Nov 15 12:05:31 ms-fe1001 proxy-server: HTTP status code mismatch. Mediawiki: 200 Thumbor: 404 URL: http://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Krypt,_basement_van_het_westelijk_zuiltje_in_de_noord_conche_uit_het_oosten_met_restanten_lemen_vloer.%28%3F%29_-_Rolduc_-_20190341_-_RCE.jpg/1024px-Krypt,_basement_van_het_westelijk_zuiltje_in_de_noord_conche_uit_het_oosten_met_restanten_lemen_vloer.%28%3F%29_-_Rolduc_-_20190341_-_RCE.jpg
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Upgrade to 0.1.30 | operations/debs/python-thumbor-wikimedia | master | +5 K -230 |
Revisions and Commits
rTHMBREXT Thumbor Plugins | |||
rTHMBREXT671af4b52492 Handle originals with question marks in their name |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Gilles | T121388 Service-based thumbnailing re-architecture in production with Thumbor | |||
Resolved | • Gilles | T139606 add thumbor to production infrastructure | |||
Resolved | • Gilles | T150760 Thumbor 404s when the original has a ? in its filename |
Event Timeline
It might be two issues, some requests seem to have special characters in them, and others have /thumb/temp.
Ah, reading thumb.php it seems like it's reading the original from the public zone, not the temp zone. Seems like they are different things. I noticed, though, that path hashing has to be corrected for temp, which might explain the issue encountered here. I.e. the temp original and the temp thumbnail have different hashing values (one computes it with the date, the other doesn't).
I'm going to double check, but it would suggest that Thumbor has to compensate for the wrong hash values being passed in the URL and recompute the correct name hash value where the original will be found...
It seems to be the temp container after all. This code is confusing as hell... So it would seem that not only the hash correction is needed, Thumbor also needs to be authed into swift to fetch the temp original. I'll write a bit of Python to see if the Thumbor Swift user can access that file...
@fgiunchedi attempting to get an object from the temp container with the mw:thumbor credentials gives me a 403:
swiftclient.exceptions.ClientException: Object GET failed: http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-temp.52/5/52/20161121140233%21Ft3W43.pdf 403 Forbidden [first 60 chars of response] <html><h1>Forbidden</h1><p>Access was denied to this resource
@Gilles indeed I've granted mw:thumbor user access only to thumb containers, looks like we'll need to do the same for temp containers too. I'll take this while I'm doing the access.
Mentioned in SAL (#wikimedia-operations) [2016-11-22T19:48:30Z] <godog> set thumbor access for temp containers - T150760
Now that I've separated the temp case into its own task, I see that the remaining ones look like an encoding problem. I notice one character in particular that all these URLs have in common, %3F, which is hex for "?". Makes sense as a problematic character.