As discovered during investigation of T89918, HHVM does not support streaming output in any mode except CLI.
Description
Status | Subtype | Assigned | Task | |
---|---|---|---|---|
· · · | ||||
Resolved | PRODUCTION ERROR | • demon | T89918 OOM reported at SpecialPage.php:534 due to large output from Special:Book | |
Resolved | Joe | T84842 Convert eqiad imagescalers to HHVM, Trusty | ||
Resolved | tstarling | T91468 HHVM with FastCGI does not support streaming output | ||
· · · |
Event Timeline
The output functions in execution-context.cpp can be made to stream their output to the transport, I have a patch for that. But then the transport allocates an unbounded amount of memory and eventually either segfaults (presumably due to malloc failure) or gets killed by oom-killer.
It seems that the relevant interfaces do not support streaming writes. The abstract parent class AsyncTransportWrapper has a write() method which promises to either send the buffer immediately or queue infinite amounts of data. It does not even keep track of how much data or how many blocks its write queue owns. It unconditionally takes ownership of the caller's buffer, and has no interface to return ownership to the caller in the event of a buffer overrun.
FastCGISession does keep track of the number of async requests that are pending (m_eventCount), but there is no synchronisation infrastructure which would make it easy to wait for WriteCallback::writeSuccess() to be called on buffer overrun.
Also, FastCGISession calls EventBase::runInEventBaseThread() for each write request, in order to pass the write buffers to the FastCGI server thread. This again uses an unbounded queue. A separate FastCGI server thread is responsible for writing. Even if the FastCGI server thread could detect a buffer overrun and wait, it has no way to block the main app thread.
This goes into the too-hard basket for now. Maybe after the HHVM devs are back online after their "infra summit", we can talk about strategies.
I have had a chat with sgolemon, paulbiss and afrind, and it's back into the "maybe possible" pile.
It seems to me that emulating blocking I/O is properly a responsibility of the I/O library, but there is a precedent for emulating it in the FastCGI layer. Blocking reads are emulated in FastCGITransport::getMorePostData() using a mutex local to the transport. So blocking writes could be done in the same way.
There is the question of why the VM thread is handing off its writes to a single-threaded async server instead of having an acceptor thread grant ownership of the connection to the VM thread, which would allow a FastCGI implementation with true blocking I/O. The rationale is apparently to allow VM threads to end the request and move on to something else, while the output is still being sent. So you can have a large connection limit and a small VM thread pool, minimising memory usage. The benefit for us is pretty small, since we have Varnish buffering responses, but for users with HHVM directly facing the web, it could be significant.
The current plan is for me to flesh out the design a bit more and assess whether I can do this myself.
Here is the patch as it stands:
It seems to work. I've tried a print loop and readfile('/dev/zero'), with and without ob_implicit_flush(), it does not use a significant amount of memory. I'm now rebuilding in release mode for performance testing. I'm not optimistic about performance, but it may be good enough for us as a stopgap even if it would be rejected upstream.
Well, strangely enough, the patched version is 1.7 times faster on my laptop when streaming a 1MB file with readfile(). I thought that lower system bus traffic might give it an advantage, but I didn't expect it to overcome the synchronisation overhead that I added. So maybe it will be acceptable upstream after all.
@tstarling did you test the patch against the development HEAD of HHVM? I'll try seeing if it applies to our version (3.3.x) as the package needed refreshing anyways. I'll let you know if I need help backporting it.
I think it would take a lot of work to backport it, since the relevant code was severely refactored in February. I think the simplest way to do it would be to backport those changes also, and any other recent changes in hphp/runtime/server/fastcgi . The transport/server interface is a plugin boundary -- Facebook have a closed-source server implementation which connects to it, so the API should be pretty stable. Note that the commit I linked above only changes files in the fastcgi directory.
Another option would be to upgrade the whole of HHVM.
I think we should upgrade HHVM to 3.6 as soon as possible when it comes out. It will make 95% of our patches redundant, a fact that will make me very happy :)
Given 3.6 should be out in a couple of weeks, I can wait for it as far as coverting the imagescalers is concerned. If we can live with the other issues we have for around one month, I would wait for that, but let's see what the feedback will be.
As I reported at T89918, I have had a closer look at the 5xx logs, and they suggest that there is not a huge amount of urgency to pushing out this patch.
How difficult would it be to apply the patch to testwiki? I'm still actively investigating T90599 which might be caused by this. It's reproducible on testwiki. If we can confirm with certainty that it's this hhvm bug causing it, it would save me a lot of time spent trying to figure out the upload stash issue and it would allow us to give a clear ETA to Commons users.
@Gilles I just found out that HHVM 3.6.0 has been released the day after I last checked. So I will devote some time to building a new package in the next few days
With segfault fix:
Segfault fix only:
diff --git a/hphp/runtime/base/execution-context.cpp b/hphp/runtime/base/execution-context.cpp index ee2a793..11146ba 100644 --- a/hphp/runtime/base/execution-context.cpp +++ b/hphp/runtime/base/execution-context.cpp @@ -315,6 +315,10 @@ void ExecutionContext::obClean(int handler_flag) { bool ExecutionContext::obFlush() { assert(m_protectedLevel >= 0); + if (m_buffers.empty()) { + return false; + } + auto iter = m_buffers.end(); OutputBuffer& last = *(--iter);
This seems to be the patch from above https://phabricator.wikimedia.org/diffusion/ODAZ/browse/master/debian/patches/output-streaming.patch which thus means that is included what we are running on CI and thus I assume also on production (didn't check). (So yes it has happened) Which means in the context of the blocked tasks this needs to be verified to fix the issues. Also someone needs to try to get this upstream or at least have an upstream bug open which references our not upstreamable solution.