Investigate why --delete seemingly deleted all but 1 captcha file when run in cronjob
Open, Needs TriagePublic

Description

Confirmed that there is only one captcha file in swift:

 tgr@terbium:~$ mwscript eval.php --wiki=metawiki
> $backend = FileBackendGroup::singleton()->get( $wgCaptchaFileBackend );
 
> $baseDir = FileBackendGroup::singleton()->get( $wgCaptchaFileBackend )->getRootStoragePath() . '/captcha-render';
 
> $fl = $backend->getFileList( [ 'dir' => $baseDir ] )
 
> var_dump(iterator_to_array($fl))
 array(1) {
   [0]=>
   string(41) "9/9/9/image_67350e1b_9996cc25d9b58fc9.png"
 }

The captcha generation script does not update/clear the cache used in FancyCaptcha::pickImage*, which it probably should, but that shouldn't be able to cause anything remotely similar.

Reedy created this task.Mar 4 2017, 4:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 4 2017, 4:48 PM
Reedy added a comment.Mar 4 2017, 5:21 PM

So, running a cronjob to create upto 10000... No --delete parameter, I see

Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.14/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.14/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.14/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.14/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.14/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: md5_file(/tmp/mw-fancycaptcha-1488645777-a1b4a7/a/6/1/image_0ec90eab_a614abaa81a86580.png): failed to open stream: Too many open files in /srv/mediawiki/php-1.29.0-wmf.14/includes/libs/filebackend/SwiftFileBackend.php on line 331

Do we get the feeling something happened with the copying in in batches, but the delete apparently worked fine? Not sure why it'd end up with 1 file still, but the above... At least for the last line

I filed T159610 to stop this information from cronjob runs going into the black hole that is /dev/null

Reedy added a subscriber: aaron.

Ping @aaron for the "Too many open files in /srv/mediawiki/php-1.29.0-wmf.14/includes/libs/filebackend/SwiftFileBackend.php on line 331" - do we know how many is too many? And why the layer let the script do that?

Reedy added a comment.Mar 4 2017, 5:24 PM

Do we get the feeling something happened with the copying in in batches, but the delete apparently worked fine? Not sure why it'd end up with 1 file still, but the above... At least for the last line

Or the 1 was some sort of fluke, a file that couldn't be deleted for some other reason?

do we know how many is too many?

That looks like reaching the max number of open files per process (ulimit -n) or per system. Maybe there's a leak of open descriptors in that script?

Reedy added a comment.Mar 5 2017, 11:04 PM

do we know how many is too many?

That looks like reaching the max number of open files per process (ulimit -n) or per system. Maybe there's a leak of open descriptors in that script?

Maybe. It worked before... rECOE7de460c289041e6fd729d0209c08077ab95f1b27 changed it from being a tight loop doing each copy/move operation, to passing a big array of paths on disk, with the hope it'd be more performant

aaron added a comment.Mar 9 2017, 2:46 AM

Batching itself is fine and well within reasonable ulimits (in fact FileBackend has a hard batch operation limit anyway, on top of the concurrency limit). Any issue would either be leaks or something else.

Change 341962 had a related patch set uploaded (by Aaron Schulz):
[mediawiki/core] Make sure Swift store operations close the source file handle

https://gerrit.wikimedia.org/r/341962

Reedy added a comment.Mar 9 2017, 6:41 PM
reedy@terbium:~$ /usr/local/bin/mwscript extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php mediawikiwiki --wordlist=/etc/fancycaptcha/words --font=/usr/share/fonts/truetype/freefont/FreeMonoBoldOblique.ttf --blacklist=/etc/fancycaptcha/badwords --fill=10000 --oldcaptcha
Current number of captchas is 1000.
Generating 9000 new captchas.. Done.

Generated 9000 captchas in 1459.5 seconds
Copying the new captchas to storage...Warning: md5_file(/tmp/mw-fancycaptcha-1489075661-03e546/a/8/e/image_733e516f_a8edc35843fb98e8.png): failed to open stream: Too many open files in /srv/mediawiki/php-1.29.0-wmf.15/includes/libs/filebackend/SwiftFileBackend.php on line 331
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
<snip>
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: md5_file(/tmp/mw-fancycaptcha-1489075661-03e546/a/8/e/image_733e516f_a8edc35843fb98e8.png): failed to open stream: Too many open files in /srv/mediawiki/php-1.29.0-wmf.15/includes/libs/filebackend/SwiftFileBackend.php on line 331
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: socket_sendto(): Host lookup failed [-9999]: Unknown error -9999 in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Errored.
Warning: socket_sendto(): Host lookup failed [-10002]: Host name lookup failure in /srv/mediawiki/php-1.29.0-wmf.15/includes/debug/logger/monolog/LegacyHandler.php on line 220
Warning: implode(): Invalid arguments passed in /srv/mediawiki/php-1.29.0-wmf.15/extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php on line 197
Removing temporary files... Done.

Whole captchas generation process took 2303.2 seconds
reedy@terbium:~$

Change 341962 merged by jenkins-bot:
[mediawiki/core] Make sure Swift store operations close the source file handle

https://gerrit.wikimedia.org/r/341962

Change 342703 had a related patch set uploaded (by Reedy; owner: Aaron Schulz):
[mediawiki/core] Make sure Swift store operations close the source file handle

https://gerrit.wikimedia.org/r/342703

Change 342703 merged by jenkins-bot:
[mediawiki/core] Make sure Swift store operations close the source file handle

https://gerrit.wikimedia.org/r/342703

Mentioned in SAL (#wikimedia-operations) [2017-03-15T18:32:37Z] <legoktm@tin> Synchronized php-1.29.0-wmf.16/includes/libs/filebackend/SwiftFileBackend.php: Make sure Swift store operations close the source file handle - T159607 (duration: 00m 44s)