Along with making captcha.py threaded in T157734 there might be further ways to make the whole process quicker.
For example, the code does numerous "store" operations in a for loop...
Along with making captcha.py threaded in T157734 there might be further ways to make the whole process quicker.
For example, the code does numerous "store" operations in a for loop...
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Restricted Task | |||||
| Open | None | T150049 Enable $wgCaptchaDeleteOnSolve | |||
| Resolved | Reedy | T157736 Speed up captcha generation | |||
| Resolved | Florian | T157734 Add threading to captcha.py | |||
| Resolved | Reedy | T157738 Use doQuickOperations instead of foreach loops calling quickStore/quickStore | |||
| Resolved | Reedy | T157737 Add timing instrumentation to GenerateFancyCaptchas.php | |||
| Resolved | Reedy | T157888 Make captcha(-old)?.py python3 compatible |
[21:27:26] <Reedy> AaronSchulz: is there a way with filebackend stuff to store many files in one go? [21:27:33] <Reedy> rather than a for loop calling quickStore? [21:29:57] <AaronSchulz> like doQuickOperations?
* Perform a set of independent file operations on some files. * b) Copy a file system file into storage * @code * [ * 'op' => 'store', * 'src' => <file system path, FSFile, or TempFSFile>, * 'dst' => <storage path>, * 'headers' => <HTTP header name/value map> # since 1.21 * ] * @endcode
So for a full run, generating 10k captchas
reedy@terbium:~$ /usr/local/bin/mwscript extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php mediawikiwiki --wordlist=/etc/fancycaptcha/words --font=/usr/share/fonts/truetype/freefont/FreeMonoBoldOblique.ttf --blacklist=/etc/fancycaptcha/badwords --fill=120000 --oldcaptcha Current number of captchas is 110000. Generating 10000 new captchas.. Done. Generated 10000 captchas in 1594.0 seconds Copying the new captchas to storage... Done. Copied 14008 captchas to storage in 3178.9 seconds Removing temporary files... Done. Whole captchas generation process took 4775.3 seconds
| Process | Time |
|---|---|
| Generate Captcha | 26m 34s |
| Copying Captchas | 52m 59s |
| Total | 79m 35s |
Change 358395 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/puppet@production] Generate FancyCaptchas in 4 threads
reedy@terbium:~$ cat /var/log/mediawiki/generate-fancycaptcha/cron.log-20170901 Generating 10000 new captchas.. Done. Generated 10000 captchas in 1180.9 seconds Getting a list of old captchas to delete... Done. Copying the new captchas to storage... Done. Copied 10000 captchas to storage in 525.1 seconds Deleting 10000 old captchas... Done. Deleted 10000 old captchas in 354.1 seconds Removing temporary files... Done. Whole captchas generation process took 2061.3 seconds reedy@terbium:~$
10,000 captcha took 34 minutes.
Roughly, 57% quicker is pretty good going from where we were before.
| Process | Old Time | New Time | Improvement |
|---|---|---|---|
| Generate Captcha | 1594.0 | 1180.9 | -25.9% |
| Copying Captchas | 3178.9 | 525.1 | - 83.5% |
| Deleting old Captchas | 1587.4 | 354.1 | -77.7% |
| Total | 4775.3 | 2061.3 | -56.8% |
So the generation improvement was T157734: Add threading to captcha.py, the deleting and copying improvements was T157738: Use doQuickOperations instead of foreach loops calling quickStore/quickStore
Still need to get https://gerrit.wikimedia.org/r/#/c/358395/ reviewed and deployed. Gonna shove that in Puppet Swat tomorrow and we'll see how we look again a month :)
Change 358395 merged by Elukey:
[operations/puppet@production] Generate FancyCaptchas in 4 threads
So from the 1st October, 2017 run:
Generating 10000 new captchas.. Done. Generated 10000 captchas in 295.8 seconds Getting a list of old captchas to delete... Done. Copying the new captchas to storage... Done. Copied 10000 captchas to storage in 359.9 seconds Deleting 10000 old captchas... Done. Deleted 10000 old captchas in 289.4 seconds Removing temporary files... Done. Whole captchas generation process took 946.5 seconds
lol, so we're down to about 15 minutes, down from around 80 minutes originally
| Process | Original Time | Time after PHP improvements | Time after threading improvements | Improvement |
|---|---|---|---|---|
| Generate Captcha | 1594.0 | 1180.9 | 295.8 | -81.4% |
| Copying Captchas | 3178.9 | 525.1 | 359.9 | -88.7% |
| Deleting old Captchas | 1587.4 | 354.1 | 289.4 | -81.8% |
| Total | 4775.3 | 2061.3 | 946.5 | -80.2% |
Of course, the key figure here, is the 81.4% decrease in the time spent generating captchas from the original point. And 74% quicker than after the PHP improvements done.
The difference in deleting/copying captchas could just be down to terbium/swift load, I'm sure that'll vary somewhat.
That's a hell of a lot better than where we were in February.
I'm gonna make a patch so we regenerate captchas weekly as part of the continuous improvement cycle
Change 382322 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/puppet@production] Regenerate FancyCaptchas weekly rather than monthly
Change 382322 merged by Filippo Giunchedi:
[operations/puppet@production] Regenerate FancyCaptchas weekly rather than monthly
I'm closing this. There may be further improvements down the line... But 15 minutes to generate 10,000 captchas end to end, doesn't seem bad to me