Captchas sent with wrong mime type on beta
Closed, ResolvedPublic

Description

Captcha png images are being sent as text/html instead of image/png

Doesn't affect much as browsers sniff images, but still not ideal.

Bawolff created this task.Apr 28 2017, 1:55 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 28 2017, 1:55 AM

I mean its being sent as Content-Type: application/x-www-form-urlencoded

Probably related: T188831

Krenair added a subscriber: Krenair.EditedMar 3 2018, 10:45 PM

Loaded the login page, it downloaded this URL:

alex@alex-laptop:~$ curl -sI 'https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:Captcha/image&wpCaptchaId=1063782475' | grep ^Content-Type
Content-Type: application/x-www-form-urlencoded

So I dug the salt and hash needed to figure out the object name out of memcached using ObjectCache::getMainStashInstance()->get( wfMemcKey( 'captcha', 1063782475 ) )
The object name will be $hash[0] . '/' . $hash[1] . '/' . $hash[2] . '/' . "image_" . $salt . "_" . $hash . ".png", and it'll grab it from global-data-captcha-render

Logged into the Swift FE machine, became root, went to /etc/swift, then:

root@deployment-ms-fe02:/etc/swift# . account_AUTH_mw.env 
root@deployment-ms-fe02:/etc/swift# swift --debug download global-data-captcha-render snip/snip/snip/image_snip_snip.png --no-download 2>&1 | grep Content-Type | tail -n1
DEBUG:swiftclient:RESP HEADERS: [('Content-Length', '14037'), ('X-Object-Meta-Sha1Base36', 'op0wkkjz73i87crgjb88mfs6ae9scio'), ('X-Object-Meta-Mtime', '1343430123.000000'), ('Accept-Ranges', 'bytes'), ('Last-Modified', 'Wed, 13 Jul 2016 21:32:47 GMT'), ('Etag', '9b6adb2110169fbd83a79a7b8f6b3774'), ('X-Timestamp', '1468445566.92396'), ('Content-Type', 'application/x-www-form-urlencoded'), ('X-Trans-Id', 'tx45ec6042ab1d4c6b845a3-005a9b22dd'), ('Date', 'Sat, 03 Mar 2018 22:34:05 GMT'), ('Connection', 'keep-alive')]

So, it looks like there's no messing with the Content-Type at the retrieval end of the captcha system.

In T131012#2153946, faidon wrote:

Swift just serves whatever Content-Type it was set to the object when it was uploaded to it by MediaWiki — it never performs any content sniffing.

So the captchas get/got uploaded to Swift with the wrong mime type.

That last-modified date was from near my NFS -> Swift migration: T64835#2459268
So the file itself may be very old

That task shows that I ran root@deployment-ms-fe01:/data/project/upload7/private/captcha# swift upload global-data-captcha-render * (I'm so glad I kept records of important commands I was running so we could refer back to them later)
Looks like swiftclient does content sniffing based on the file extension though:

root@deployment-ms-fe02:/etc/swift# touch testfile.html
root@deployment-ms-fe02:/etc/swift# swift upload global-data-captcha-render testfile.html
testfile.html
root@deployment-ms-fe02:/etc/swift# swift --debug download global-data-captcha-render testfile.html --no-download 2>&1 | grep Content-Type | tail -n1
DEBUG:swiftclient:RESP HEADERS: [('Content-Length', '0'), ('Content-Type', 'text/html'), ('Accept-Ranges', 'bytes'), ('Last-Modified', 'Sat, 03 Mar 2018 23:14:02 GMT'), ('Etag', 'd41d8cd98f00b204e9800998ecf8427e'), ('X-Timestamp', '1520118841.60094'), ('X-Object-Meta-Mtime', '1520118831.473796'), ('X-Trans-Id', 'tx46d161c44565461a9d53f-005a9b2c3e'), ('Date', 'Sat, 03 Mar 2018 23:14:06 GMT'), ('Connection', 'keep-alive')]

The answer probably lies in some python-swiftclient or python-requests version used by the Trusty -ms-fe01 host on which I originally did the migration (-ms-fe02 came along in T162247 and it runs Jessie)
Let's see if we can just regenerate all captcha images

Krenair added a comment.EditedMar 4 2018, 1:49 AM

Couldn't quite just use https://wikitech.wikimedia.org/wiki/Generating_CAPTCHAs because it expects prod (antispam) secrets to be in place (the ones in labs/private.git are predictably useless)
Had to download equivalents from a couple of helpful github repos
Am regenerating now

Mentioned in SAL (#wikimedia-cloud) [2018-03-04T02:12:54Z] <Krenair> Regenerated captcha images for T164047

Krenair closed this task as Resolved.Mar 4 2018, 2:13 AM
Krenair claimed this task.

Well that did the trick

krenair@deployment-tin:/srv/mediawiki-staging/php-master/extensions/ConfirmEdit$ mwscript extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php aawiki --wordlist=/tmp/words --font=/usr/share/fonts/truetype/freefont/FreeMonoBoldOblique.ttf --blacklist=/tmp/badwords --fill=10000 --verbose --delete
Generating 10000 new captchas.. Done.

Generated 10000 captchas in 775.4 seconds
Getting a list of old captchas to delete... Done.
Copying the new captchas to storage... Done.

Copied 10000 captchas to storage in 158.5 seconds
Deleting 10000 old captchas...
Done.

Deleted 10000 old captchas in 120.1 seconds
Removing temporary files... Done.

Whole captchas generation process took 1061.1 seconds

Stuff to get the word lists is at deployment-tin:/home/krenair/get-words-lists.sh