mediawiki waits for memcached connection forever
Closed, ResolvedPublic

Description

Today a bunch of DB transactions hung open from mw1207. The offending apache threads showed this stack trace:

(gdb) zbacktrace
[0xbe6a1190] usleep()
/usr/local/apache/common-local/php-1.23wmf18/includes/objectcache/BagOStuff.php:188
[0xbe6a0ae8] lock()
/usr/local/apache/common-local/php-1.23wmf18/includes/filerepo/file/LocalFile.php:1832
[0xbe6a00b0] lock()
/usr/local/apache/common-local/php-1.23wmf18/includes/filerepo/file/LocalFile.php:1164
[0xbe69fc30] upload()
/usr/local/apache/common-local/php-1.23wmf18/includes/upload/UploadBase.php:692
[0xbe69e790] performUpload()
/usr/local/apache/common-local/php-1.23wmf18/includes/api/ApiUpload.php:649
[0xbe69e3e0] performUpload()
/usr/local/apache/common-local/php-1.23wmf18/includes/api/ApiUpload.php:144
[0xbe69cc98] getContextResult()
/usr/local/apache/common-local/php-1.23wmf18/includes/api/ApiUpload.php:111
[0xbe69c610] execute()
/usr/local/apache/common-local/php-1.23wmf18/includes/api/ApiMain.php:900
[0xbe69c050] executeAction()
/usr/local/apache/common-local/php-1.23wmf18/includes/api/ApiMain.php:364
[0xbe69be48] executeActionWithErrorHandling()
/usr/local/apache/common-local/php-1.23wmf18/includes/api/ApiMain.php:335
[0xbe69aee8] execute()
/usr/local/apache/common-local/php-1.23wmf18/api.php:86
[0xbe69add8] ??? /usr/local/apache/common-local/w/api.php:3

Paravoid noticed mw1207 twemproxy had stopped listening on port 11211. Restarting it caused the open txns to complete and the apache threads to continue as normal.

Mediawiki shouldn't wait forever if memcached is not responding.


Version: 1.23.0
Severity: major

bzimport set Reference to bz63058.
Springle created this task.Via LegacyMar 25 2014, 11:34 AM
gerritbot added a comment.Via ConduitMar 31 2014, 8:39 PM

Change 122550 had a related patch set uploaded by Aaron Schulz:
Speed up LocalFile locking behavoir

https://gerrit.wikimedia.org/r/122550

gerritbot added a comment.Via ConduitApr 1 2014, 9:32 AM

Change 122550 merged by jenkins-bot:
Speed up LocalFile locking behavior

https://gerrit.wikimedia.org/r/122550

gerritbot added a comment.Via ConduitApr 1 2014, 8:27 PM

Change 123041 had a related patch set uploaded by Aaron Schulz:
Made BagOStuff fail fast in cas/lock on certain errors

https://gerrit.wikimedia.org/r/123041

gerritbot added a comment.Via ConduitApr 2 2014, 9:50 PM

Change 123041 merged by jenkins-bot:
Made BagOStuff fail fast in cas/lock on certain errors

https://gerrit.wikimedia.org/r/123041

Aklapper added a comment.Via ConduitApr 4 2014, 11:10 PM

Aaron: Both patches merged. Do we wait for checking if this still happens and if more work is needed, or can this issue be considered fixed?

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.