Captcha png images are being sent as text/html instead of image/png
Doesn't affect much as browsers sniff images, but still not ideal.
Captcha png images are being sent as text/html instead of image/png
Doesn't affect much as browsers sniff images, but still not ideal.
Loaded the login page, it downloaded this URL:
alex@alex-laptop:~$ curl -sI 'https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:Captcha/image&wpCaptchaId=1063782475' | grep ^Content-Type Content-Type: application/x-www-form-urlencoded
So I dug the salt and hash needed to figure out the object name out of memcached using ObjectCache::getMainStashInstance()->get( wfMemcKey( 'captcha', 1063782475 ) )
The object name will be $hash[0] . '/' . $hash[1] . '/' . $hash[2] . '/' . "image_" . $salt . "_" . $hash . ".png", and it'll grab it from global-data-captcha-render
Logged into the Swift FE machine, became root, went to /etc/swift, then:
root@deployment-ms-fe02:/etc/swift# . account_AUTH_mw.env root@deployment-ms-fe02:/etc/swift# swift --debug download global-data-captcha-render snip/snip/snip/image_snip_snip.png --no-download 2>&1 | grep Content-Type | tail -n1 DEBUG:swiftclient:RESP HEADERS: [('Content-Length', '14037'), ('X-Object-Meta-Sha1Base36', 'op0wkkjz73i87crgjb88mfs6ae9scio'), ('X-Object-Meta-Mtime', '1343430123.000000'), ('Accept-Ranges', 'bytes'), ('Last-Modified', 'Wed, 13 Jul 2016 21:32:47 GMT'), ('Etag', '9b6adb2110169fbd83a79a7b8f6b3774'), ('X-Timestamp', '1468445566.92396'), ('Content-Type', 'application/x-www-form-urlencoded'), ('X-Trans-Id', 'tx45ec6042ab1d4c6b845a3-005a9b22dd'), ('Date', 'Sat, 03 Mar 2018 22:34:05 GMT'), ('Connection', 'keep-alive')]
So, it looks like there's no messing with the Content-Type at the retrieval end of the captcha system.
So the captchas get/got uploaded to Swift with the wrong mime type.
That last-modified date was from near my NFS -> Swift migration: T64835#2459268
So the file itself may be very old
That task shows that I ran root@deployment-ms-fe01:/data/project/upload7/private/captcha# swift upload global-data-captcha-render * (I'm so glad I kept records of important commands I was running so we could refer back to them later)
Looks like swiftclient does content sniffing based on the file extension though:
root@deployment-ms-fe02:/etc/swift# touch testfile.html root@deployment-ms-fe02:/etc/swift# swift upload global-data-captcha-render testfile.html testfile.html root@deployment-ms-fe02:/etc/swift# swift --debug download global-data-captcha-render testfile.html --no-download 2>&1 | grep Content-Type | tail -n1 DEBUG:swiftclient:RESP HEADERS: [('Content-Length', '0'), ('Content-Type', 'text/html'), ('Accept-Ranges', 'bytes'), ('Last-Modified', 'Sat, 03 Mar 2018 23:14:02 GMT'), ('Etag', 'd41d8cd98f00b204e9800998ecf8427e'), ('X-Timestamp', '1520118841.60094'), ('X-Object-Meta-Mtime', '1520118831.473796'), ('X-Trans-Id', 'tx46d161c44565461a9d53f-005a9b2c3e'), ('Date', 'Sat, 03 Mar 2018 23:14:06 GMT'), ('Connection', 'keep-alive')]
The answer probably lies in some python-swiftclient or python-requests version used by the Trusty -ms-fe01 host on which I originally did the migration (-ms-fe02 came along in T162247 and it runs Jessie)
Let's see if we can just regenerate all captcha images
Couldn't quite just use https://wikitech.wikimedia.org/wiki/Generating_CAPTCHAs because it expects prod (antispam) secrets to be in place (the ones in labs/private.git are predictably useless)
Had to download equivalents from a couple of helpful github repos
Am regenerating now
Mentioned in SAL (#wikimedia-cloud) [2018-03-04T02:12:54Z] <Krenair> Regenerated captcha images for T164047
Well that did the trick
krenair@deployment-tin:/srv/mediawiki-staging/php-master/extensions/ConfirmEdit$ mwscript extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php aawiki --wordlist=/tmp/words --font=/usr/share/fonts/truetype/freefont/FreeMonoBoldOblique.ttf --blacklist=/tmp/badwords --fill=10000 --verbose --delete Generating 10000 new captchas.. Done. Generated 10000 captchas in 775.4 seconds Getting a list of old captchas to delete... Done. Copying the new captchas to storage... Done. Copied 10000 captchas to storage in 158.5 seconds Deleting 10000 old captchas... Done. Deleted 10000 old captchas in 120.1 seconds Removing temporary files... Done. Whole captchas generation process took 1061.1 seconds
Stuff to get the word lists is at deployment-tin:/home/krenair/get-words-lists.sh