Page MenuHomePhabricator

Make Shellbox actually do streaming
Open, LowPublic


I may have told a few people that Shellbox will be streaming, so I'm filing this task to note that I was partly wrong about that and to document what the limitations are.

TL;DR: Shellbox has low memory usage for file transfers, but there is temporary file buffering in both directions.


PHP fully reads the request body from the client before it begins to execute userspace code. There are three ways for it to process that data as it comes in:

  • It can store it memory, in $_POST etc.
  • It can write uploaded files to a temporary directory, accessible with $_FILES. I didn't do this for Shellbox because I wanted to have the full request body available for HMAC verification.
  • It can write the whole request body to a php://temp stream. This type of stream starts off as a memory buffer, and then is written to a file when it is larger than a certain threshold. PHP rewinds it and provides access to the stream as php://input. This is what we're doing.

Shellbox's MultipartReader is streaming in the sense that it reads php://input with low memory usage, but MultipartReader cannot begin processing until all the request data has been buffered.

Fixing this would be difficult.

Guzzle Response::getBody()

Guzzle provides Response::getBody(), which returns a PSR-7 StreamInterface. However by the time this Response object is returned to the caller, the HTTP request is no longer in flight.

By default, Guzzle implements this by creating a php://temp stream. It registers a CURLOPT_WRITEFUNCTION which writes the response data to the stream. Before it returns to the client, it rewinds the stream. So reading a response body from Guzzle has essentially the same characteristics as reading php://input on the server side.

It would be possible to improve on this. MultipartReader could be rewritten with inverted control, so that it provides a write function instead of a read function. The write function would buffer in memory until headers are available, and then it would copy the part contents to a file. By wrapping this write function in a StreamInterface and passing it to the Guzzle client constructor in a "sink" option, curl will write to MultipartReader as it receives data.

If this is done, Response::getBody() would presumably be empty or otherwise non-functional. Response::getBody() would return the StreamInterface which is wrapping the custom write function, but it would not need to provide a working read or rewind function.

Command stdout/stderr

Command stdout is at present always handled as a string in memory, on both the server and client side. It would be possible to use php://temp streams for offloading stdout/stderr to disk, but it's simpler to fix callers to redirect to a file than to make stdout/stderr do streaming. MediaWiki's existing shell handling stores stdout as a string, and I didn't see a good reason to change that.