I'm not aware of any fixes for this specific issue. I had the original author of StatCache take a look at @BBlack's comments and he said it shouldn't be possible without some kind of memory corruption, which could've been fixed (or perturbed out of existence).
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 22 2015
Mar 3 2015
Feb 23 2015
With everything waiting on a lock I'd suspect some kind of deadlock, but a lot of those thread backtraces don't make any sense. The one for Thread 4 in particular claims to be in code that only runs when you're building a repo ahead of time. I also don't know what's up with the threads that claim to be calling __ll_lock_wait from vector::emplace_back.
Dec 19 2014
Someone's already doing option (b) internally, which will at least turn segfaults into fatal errors. I'll let you know when the fix makes it out to master (it's pretty simple). He's also going to look at (c) but that might not end up happening if it's too complex or makes things slower.
Nov 6 2014
When I run the latest repro in an asan build, it reliably yells about a double free regardless of $n. This diff applied on top of the previous patch should fix the crash, and also another leak: https://gist.github.com/swtaarrs/2577519dbcf6e6e062a2
Nov 5 2014
Crashes are still happening with the new patch, which I kinda expected. We poked at a core file for a while and it looks like it's crashing on a double free of an <a> element with these parents: <body><div id="mainpage"><table id="mp-tfa"><tr><td><p><a href="/wiki/প্রামাণ্যচিত্র">প্রামাণ্যচিত্র</a>. The document also contains headers for "featured article" and "did you know", strongly suggesting it has something to do with this code: https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/master/includes/MobileFormatter.php.
Nov 4 2014
I rebased Tim's fix onto current master and the small repro is segfaulting now. I took a deeper look at the code and think I might have a more robust fix. I'll update as soon as I know more.
Oct 28 2014
Here's one of those graphs, from running the repro case for a while:
I can't find any evidence of that path getting hit; it looks like all the leaked memory is still coming from the call to htmlParseDocument in dom_load_html (I've been looking at call graphs from pprof's --pdf option).
Here's some stuff I wrote up before seeing @ori's most recent test case:
Oct 27 2014
Awesome, thanks for the investigation. I can take a look at fixing this unless one of you already is.