Page MenuHomePhabricator

Support large files in Shellbox
Open, Stalled, HighPublic

Description

If you need access to a file, Shellbox will include that file in the POST request, and set it up inside the temporary directory. This works pretty well, except for large files (1G+ tiffs, etc.) we're probably going to run up against various size limits...and I don't think POSTing a 1GB file is that great anyways.

One option @Joe and I had discussed was having the container download the image from swift directly, instead of having MediaWiki POST it.

MediaWiki would set an environment variable with a URL pointing to the file in swift, then the retrieveMetaData.sh script would curl it, saving it into the temporary directory. We could have k8s provision a netrc file for curl to use for athentication with swift (if we give the container read-only access to swift, we should consider splitting into 2 shellboxes for public and private wikis). This is assuming that for uploads, by the time we're trying to extract metadata it's already been uploaded into swift, which I haven't checked.

Other suggestions/approaches?

Event Timeline

@fgiunchedi I'd appreciate your input on how this would potentially interact with swift, specifically:

  • Can we create separate credentials for a Shellbox to get read-only access to MW originals in Swift?
    • Can we segment access to files for private wikis?
    • What bad things could an attacker do if had arbitrary code execution inside a Shellbox with those credentials? (aside from launching a DoS attack) I'm assuming all these files are public, so there's no risk of a data leak?
  • Would MW be able to construct a URL that Shellbox could just wget/curl to get the file? Or would it need a specialized swift client to do the download in the container?
  • Also if you have any other suggestions/alternative approaches on how to deal with these kinds of large files.

Thanks for reaching out @Legoktm

@fgiunchedi I'd appreciate your input on how this would potentially interact with swift, specifically:

  • Can we create separate credentials for a Shellbox to get read-only access to MW originals in Swift?
    • Can we segment access to files for private wikis?

Yes to both, we had a similar use case for thumbor were we had to grant access to private wiki containers (e.g. T187822). For originals on public wikis you don't need a read only account though

  • What bad things could an attacker do if had arbitrary code execution inside a Shellbox with those credentials? (aside from launching a DoS attack) I'm assuming all these files are public, so there's no risk of a data leak?

Correct, for public wikis leaking isn't a problem so overall fairly limited damage. For private wikis there's obviously the risk of data exfiltration

  • Would MW be able to construct a URL that Shellbox could just wget/curl to get the file? Or would it need a specialized swift client to do the download in the container?

Yes IIRC swift does support signed urls with expiry times, can't remember if mw implements it. I'm assuming you have private wikis in mind in this case

  • Also if you have any other suggestions/alternative approaches on how to deal with these kinds of large files.

I think overall it makes sense for shellbox to download large files, is re-uploading also a requirement? I think though at that point it'll be a regular upload through mw's API (?)

@fgiunchedi I'd appreciate your input on how this would potentially interact with swift, specifically:

  • Can we create separate credentials for a Shellbox to get read-only access to MW originals in Swift?
    • Can we segment access to files for private wikis?

Yes to both, we had a similar use case for thumbor were we had to grant access to private wiki containers (e.g. T187822). For originals on public wikis you don't need a read only account though

Great, that should make it easier to test large files on public wikis first then.

  • What bad things could an attacker do if had arbitrary code execution inside a Shellbox with those credentials? (aside from launching a DoS attack) I'm assuming all these files are public, so there's no risk of a data leak?

Correct, for public wikis leaking isn't a problem so overall fairly limited damage. For private wikis there's obviously the risk of data exfiltration

RIght. Once we have a separate shellbox-media-private deployment, you'd need to be able to somehow upload a malicious file to a private wiki to get inside the container to try an exfiltration attack. Given that users on private wikis are relatively trusted, and it would not be scalable to segment each individual private wiki, I think that's an acceptable/necessary risk.

  • Would MW be able to construct a URL that Shellbox could just wget/curl to get the file? Or would it need a specialized swift client to do the download in the container?

Yes IIRC swift does support signed urls with expiry times, can't remember if mw implements it. I'm assuming you have private wikis in mind in this case

I did not realize swift could do that, but that sounds great. I'll look into the MW side. I mostly just wanted to write the Shellbox code in a backend-agnostic way, so the only requirement would be "

I think overall it makes sense for shellbox to download large files, is re-uploading also a requirement? I think though at that point it'll be a regular upload through mw's API (?)

Not yet. For this use case, Shellbox is just extracting metadata and passing it back to MediaWiki. So the diagram is basically: MediaWiki uploads to swift -> MW shellboxes out for metadata -> Shellbox downloads from swift, runs pdfinfo/tiffinfo/etc. -> Shellbox replies to MW with metadata -> MW stores metadata in database (MariaDB).

I say "yet" because we also want to move videoscaling to a Shellbox-based system, and in that case we would be generating large transcodes that need to end up in Swift. However that's a separate project, and it's my OKR this quarter to figure out a plan on how to do that. I'm sure I'll be asking you for input on that as well once I get there :)

@fgiunchedi I'd appreciate your input on how this would potentially interact with swift, specifically:

  • Can we create separate credentials for a Shellbox to get read-only access to MW originals in Swift?
    • Can we segment access to files for private wikis?

Yes to both, we had a similar use case for thumbor were we had to grant access to private wiki containers (e.g. T187822). For originals on public wikis you don't need a read only account though

Great, that should make it easier to test large files on public wikis first then.

  • What bad things could an attacker do if had arbitrary code execution inside a Shellbox with those credentials? (aside from launching a DoS attack) I'm assuming all these files are public, so there's no risk of a data leak?

Correct, for public wikis leaking isn't a problem so overall fairly limited damage. For private wikis there's obviously the risk of data exfiltration

RIght. Once we have a separate shellbox-media-private deployment, you'd need to be able to somehow upload a malicious file to a private wiki to get inside the container to try an exfiltration attack. Given that users on private wikis are relatively trusted, and it would not be scalable to segment each individual private wiki, I think that's an acceptable/necessary risk.

  • Would MW be able to construct a URL that Shellbox could just wget/curl to get the file? Or would it need a specialized swift client to do the download in the container?

Yes IIRC swift does support signed urls with expiry times, can't remember if mw implements it. I'm assuming you have private wikis in mind in this case

I did not realize swift could do that, but that sounds great. I'll look into the MW side. I mostly just wanted to write the Shellbox code in a backend-agnostic way, so the only requirement would be "

Looks like there's a missing part after " ? But yes that feature isn't generic indeed, I found the code by grepping for temp_url (i.e. https://docs.openstack.org/swift/latest/api/temporary_url_middleware.html) and it is at SwiftFileBackend.php.

I think overall it makes sense for shellbox to download large files, is re-uploading also a requirement? I think though at that point it'll be a regular upload through mw's API (?)

Not yet. For this use case, Shellbox is just extracting metadata and passing it back to MediaWiki. So the diagram is basically: MediaWiki uploads to swift -> MW shellboxes out for metadata -> Shellbox downloads from swift, runs pdfinfo/tiffinfo/etc. -> Shellbox replies to MW with metadata -> MW stores metadata in database (MariaDB).

I say "yet" because we also want to move videoscaling to a Shellbox-based system, and in that case we would be generating large transcodes that need to end up in Swift. However that's a separate project, and it's my OKR this quarter to figure out a plan on how to do that. I'm sure I'll be asking you for input on that as well once I get there :)

Sounds great! Thanks for the additional context, metadata-only does indeed make the problem much simpler to get started with.

I did consider having swift access in Shellbox, but we didn't have a use case for it, and allowing network access and giving it a swift secret means there is more scope for malicious action inside the container, and it means you have to add a swift client dependency. So I didn't do it.

My idea as an alternative was streaming (T268427). Shellbox can handle a 10 GB file without running out of memory, as documented on that task, it's just inefficient for various reasons. The most intractable one is the fact that the client has to forward network traffic from Swift to Shellbox, but that's required to keep the same privilege model.

I think you can post a 1GB file to Shellbox. I would like to see any problems with doing that be quantified in terms of latency and resource usage before just giving Shellbox access to Swift.

I thought I had replied earlier, for now the plan is to test POSTing large files to Shellbox, identify what layers it fails at and fix those.

A basic test would be to download a big file (maybe https://commons.wikimedia.org/wiki/File:Andromeda_Galaxy_M31_-_Heic1502a_Full_resolution.tiff) to mwmaint and then run something like:

$command = MediaWikiServices::getInstance()->getShellCommandFactory()->createBoxed( 'pagedtiffhandler' )->disableNetwork()->firejailDefaultSeccomp()->routeName( 'pagedtiffhandler-metadata' );
$command->params( 'tiffinfo', 'file.tiff' );
$command->inputFileFromFile( 'file.tiff', __DIR__ . '/downloaded-file.tiff' );
$result = $command->execute();
var_dump($result);

I thought I had replied earlier, for now the plan is to test POSTing large files to Shellbox, identify what layers it fails at and fix those.

A basic test would be to download a big file (maybe https://commons.wikimedia.org/wiki/File:Andromeda_Galaxy_M31_-_Heic1502a_Full_resolution.tiff) to mwmaint and then run something like:

use MediaWiki\MediaWikiServices;
$command = MediaWikiServices::getInstance()->getShellCommandFactory()->createBoxed( 'pagedtiffhandler' )->disableNetwork()->firejailDefaultSeccomp()->routeName( 'pagedtiffhandler-metadata' );
$command->params( 'tiffinfo', 'file.tiff' );
$command->inputFileFromFile( 'file.tiff', __DIR__ . '/downloaded-file.tiff' );
$result = $command->execute();
var_dump($result);

Running this on mwmaint1002 resulted in:

>>> var_dump($result);
object(Shellbox\Command\BoxedResult)#3653 (4) {
  ["files":"Shellbox\Command\BoxedResult":private]=>
  array(0) {
  }
  ["exitCode":"Shellbox\Command\UnboxedResult":private]=>
  int(1)
  ["stdout":"Shellbox\Command\UnboxedResult":private]=>
  string(0) ""
  ["stderr":"Shellbox\Command\UnboxedResult":private]=>
  string(26) "execvp: Permission denied
"
}

@tstarling any idea of what I am doing wrong here?

Updating myself: looks like the error comes from the fact mwmaint is *not* using remote shellbox to execute scripts/retrieveMetaData.sh - see https://logstash.wikimedia.org/goto/54a14411c613702412af091616f31203

I was not able to reproduce that error.

[2240][tstarling@mwmaint1002:/home/oblivian]$ mwscript eval.php --wiki=enwiki --ignore-errors
> $command = MediaWiki\MediaWikiServices::getInstance()->getShellCommandFactory()->createBoxed( 'pagedtiffhandler' )->disableNetwork()->firejailDefaultSeccomp()->routeName( 'pagedtiffhandler-metadata' );


> $command->params( 'tiffinfo', 'file.tiff' );

> $command->inputFileFromFile( 'file.tiff', 'downloaded-file.tiff' );

> $result = $command->execute();
Caught exception Shellbox\ShellboxError: Shellbox server returned incorrect Content-Type

I moved to mwdebug1002 for a quieter tcpdump stream, and collected the response:

HTTP/1.1 200 OK
date: Wed, 05 Jan 2022 23:17:18 GMT
server: envoy
x-powered-by: PHP/7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf3
vary: Accept-Encoding
backend-timing: D=3258202 t=1641424638301363
content-type: text/html; charset=UTF-8
x-envoy-upstream-service-time: 189
transfer-encoding: chunked

c2
<br />
<b>Fatal error</b>:  Allowed memory size of 209715200 bytes exhausted (tried to allocate 199229472 bytes) in <b>/srv/app/vendor/guzzlehttp/psr7/src/Utils.php</b> on line <b>323</b><br />

Change 751851 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/libs/Shellbox@master] Fix OOM in MultipartAction when handling a large request body

https://gerrit.wikimedia.org/r/751851

Change 751851 merged by jenkins-bot:

[mediawiki/libs/Shellbox@master] Fix OOM in MultipartAction when handling a large request body

https://gerrit.wikimedia.org/r/751851

The above change still needs to be deployed, I won't have time until mid next week if someone wants to beat me to it.

Yep, you'll need to create a commit like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/751918 for shellbox-media, +2 it, wait for the cron to pull it onto deploy1002, then helmfile -e staging apply from the helmfile.d/shellbox-media directory.

If you want, at that point you should be able to set $wgShellboxUrls['pagedtiffhandler'] = 'https://staging.svc.eqiad.wmnet:4015; and retry the command against the staging cluster...but for something pretty safe like this I'd just deploy it to eqiad/codfw clusters and just try it normally.

For the record, I'm taking care of this release, and given I am annoyed at how we manage image versions for shellbox, I'm also slightly modifying the procedure. I'll add docs to wikitech once I'm done.

Change 753021 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] shellbox-*: promote to new build

https://gerrit.wikimedia.org/r/753021

Change 753021 merged by jenkins-bot:

[operations/deployment-charts@master] shellbox-*: promote to new build

https://gerrit.wikimedia.org/r/753021

After the deployment, running the same command as before results in:

var_dump($result);
object(Shellbox\Command\BoxedResult)#675 (4) {
  ["files":"Shellbox\Command\BoxedResult":private]=>
  array(0) {
  }
  ["exitCode":"Shellbox\Command\UnboxedResult":private]=>
  int(0)
  ["stdout":"Shellbox\Command\UnboxedResult":private]=>
  string(18173) "TIFF Directory at offset 0x6c466db8 (1816554936)
  Subfile Type: (0 = 0x0)
  Image Width: 40000 Image Length: 12788
  Resolution: 72, 72 pixels/inch
  Bits/Sample: 8
...

uploading this file took 24 seconds, which is a long time but possibly reasonable for such a large file?

Now my only slight worry is generating excessive network traffic, but that can only be assessed once we have some actual traffic here.

@tstarling do you have any idea how I could assess what's the expected number of calls to shellbox-media for this function?

I would say we can consider this to be a viable route to pursue to finalize the transition to shellbox.

Is the procedure the one documented at https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments ?

I've written down a more specific guideline at https://wikitech.wikimedia.org/wiki/Shellbox#Deploying_a_new_version

I will improve what's there in the coming days, but that should be enough for a deployer to figure out what to do.

Change 753435 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] shellbox-media: bump cpus available, reduce the number of pods

https://gerrit.wikimedia.org/r/753435

Change 753435 merged by jenkins-bot:

[operations/deployment-charts@master] shellbox-media: bump cpus available, reduce the number of pods

https://gerrit.wikimedia.org/r/753435

I tried to give more resources to the shellbox container, but that didn't matter much - I guess the shellout we're running is single-threaded anyways. I can try to beef up the cpu/memory of the envoy container in front of shellbox, but that won't improve much the render times anyways.

Now I'll find out how this compares with the same operation performed locally, and I'll do better benchmarks; however it would be useful if shellbox had some basic metrics it emitted, I'll look into adding them if they're not there.

My worry is mainly that shelling out remotely in this case, given the data transfer size, will slow down further large file uploads, risking to cause more timeouts. I'll update the task when I have news.

Joe triaged this task as High priority.Jan 12 2022, 2:33 PM

I ran the command locally (I think!) on mwmaint1002, and it took a comparable time to what it took calling shellbox - apparently the transfer time is very small compared to the time it takes to prepare the file for input.
@tstarling I think it is possible that about 10 seconds are spent copying the input file into the sandbox (via InputFileFromFile) - am I correct in thinking that we should be able to avoid this copy when we execute the command remotely?

I did the following test:

  1. - Try to upload the image via Special:Upload to testwiki using "upload via url", which currently has wmgUsePagedTiffHandlerShellbox set to true, on mwdebug. It times out after 202 seconds (the time limit in php-fpm via request_terminate_timeout is 201 seconds, 202 is the timeout in apache). In this case the script seems to be called twice, with the timeout happening while executing the second call, see https://logstash.wikimedia.org/goto/ae32fc43f7b06d894cbd7c59fa6fc080
  2. - Try to upload the image to testwiki after changing manually wmgUsePagedTiffHandlerShellbox to false on mwdebug1001. This succeeds after 194 seconds, but an error is returned anyways as (IIRC) the timeout on the frontend caches is 180 seconds. So the image is uploaded but the user gets an error. This is something we've had to fix for a long time, but unrelated to the current issue. In this case the script seems to be called twice, see https://logstash.wikimedia.org/goto/e86e8520c572b4709595d1a10f12aeea
  3. - Try to upload the image to testwiki with wmgUsePagedTiffHandlerShellbox set to true but raising the timeout to 300 seconds in mediawiki, php-fpm and apache2. Still fails after 300 seconds. In this case, the script seems to be called three times, see https://logstash.wikimedia.org/goto/0db9bb9d9484149f0d231b3aec2ef701

The main issue here is that, according to both the mediawiki logs in case 2, and the envoy logs in case 3, we call pagedtiffhandler-metadata three times for a single upload, so the time penalty we pay is too large in this case.

So summarizing:

  • Shellbox supports large file uploads just fine; it is pretty slow though - from the envoy logs, for this file the whole shellbox call takes about 24 seconds. I don't have data on how much it takes if the call is local at the moment, I'll verify it in my next set of tests.
  • Overall the process with the remote calls takes over 100 seconds more than without. Given the total execution time from shellbox is ~ 110 seconds, I suspect there's more time spent in sending the request to shellbox, as proved by the fact that in a test from mwdebug the total time to get a response was around 40 seconds

I think that before working on the performance of shellbox itself, we should understand why this is called three times in a single upload (and why just 2 for a non-remote call...).

I'll try to break down the numbers better by using longer timeouts and xhgui; that should allow us to understand better where time is actually spent.

On mwdebug1002 I have set the excimer time limit (in mediawiki's code), the envoy timeout, the apache timeout, the php-fpm request_terminate_timeout, the php max_execution_time (!!!) and restarted all the relevant deamons. Using Xhgui I found the following results.

Traditional shellout for tiff metadata

XHGUI url: https://performance.wikimedia.org/xhgui/run/view?id=61e03a732e0458a23c3122b0

The request took 272 seconds, with a cpu time of 180 seconds. I think in part the absolute numbers are skewed by the use of xhgui, so I suspect that in particular cpu intensive work might have a significant overhead.

We spend:

  • 65 seconds in Shellbox\FileUtils::copy over 10 calls. Given the size of the file, not surprising. But also quite inefficient
  • 55 seconds in MultiHttpClient::runMultiCurl, which I guess is mostly the time to download the image
  • 42 seconds in FSFile::getSha1Base36 which again I think is some shellbox function.

Times can change significantly, see this second run https://performance.wikimedia.org/xhgui/run/view?id=61e03f141c2ba6bc8449d7a6

Shellbox for tiff metadata

XHGUI url: https://performance.wikimedia.org/xhgui/run/view?id=61e0419ba229c5becd0d4bbe

The request took overall 356 seconds, with a cpu time of 217 seconds.

  • 87 seconds in MultiHttpClient::runMultiCurl, so an additional 23 seconds over two additional calls
  • 75 seconds in GuzzleHttp\Handler\CurlHandler::__invoke over 4 calls. 65 additional seconds and 3 calls, seems to be fairly clear these are the calls to the remote shellbox
  • 52 seconds in Shellbox\Client::computeHmac over 3 calls, I guess all signatures for the remote shellbox calls
  • 23 seconds in FSFile::getSha1Base36, which is actually 19 seconds less, with one less call
  • 16 seconds over 2 calls to StoreFileOp::getSourceSha1Base36 - probably a different code path

I'm not sure of all the details here as I didn't go and read the source carefully, but it seems to me that we have two issues:

  • We make N calls to shellbox, when one should be enough. We call it from PagedTiffHandler::getMetaData, which is called by both UploadBase::verifyPartialFile and UploadBase::verifyFile, and from PagedTiffHandler::verifyUpload, which is called by UploadBase::verifyFile as well. We should cache this value the first time we retrieve it.
  • We compute a very expensive hmac to sign the request (and I guess, on shellbox's side, to verify the signature) which doesn't scale for such large files.

@tstarling I think it should be relatively easy to solve the former issue, but the latter seems to be an inherent limitation of how we do requests in shellbox. Thoughts?

Change 753941 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[mediawiki/extensions/PagedTiffHandler@master] Allow proper caching of PagedTiffImage and its metadata

https://gerrit.wikimedia.org/r/753941

Change 753941 merged by jenkins-bot:

[mediawiki/extensions/PagedTiffHandler@master] Allow proper caching of PagedTiffImage and its metadata

https://gerrit.wikimedia.org/r/753941

Change 754578 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/PagedTiffHandler@master] Use getSizeAndMetadata() and general style/maintenance updates

https://gerrit.wikimedia.org/r/754578

52 seconds in Shellbox\Client::computeHmac over 3 calls, I guess all signatures for the remote shellbox calls

I benchmarked the SHA-256 HMAC we're using at 5.2 seconds per gigabyte. If that's too slow, one option is to weaken the hash. On the same test, SHA-1 took 1.8 seconds, and MD5 took 1.3 seconds. Another option is to improve the implementation. PHP has its own implementation of the SHA family hash functions which are not using hardware acceleration. Intel has an article explaining how to use the Intel SHA intrinsics to implement SHA-256, including sample code.

52 seconds in Shellbox\Client::computeHmac over 3 calls, I guess all signatures for the remote shellbox calls

I benchmarked the SHA-256 HMAC we're using at 5.2 seconds per gigabyte. If that's too slow, one option is to weaken the hash. On the same test, SHA-1 took 1.8 seconds, and MD5 took 1.3 seconds. Another option is to improve the implementation. PHP has its own implementation of the SHA family hash functions which are not using hardware acceleration. Intel has an article explaining how to use the Intel SHA intrinsics to implement SHA-256, including sample code.

First let's see what happens once we've reduced the calls to shellbox. But it could be a good idea to conditionally switch to md5 for request signature when the request content-length is over a certain threshold.

There is openssl_digest() which presumably has hardware acceleration and can do SHA-256 in 2.1 seconds per gigabyte. But its input is a single string, not a stream, so it can't quite work as a drop-in replacement. At least it gives you some idea of what an optimisation patch for PHP could do.

Using a known broken hash like MD5 seems wrong in what's supposed to be a security-sensitive application. Since we are already calculating the SHA-1 hash for img_sha1, can we reuse that and only sign the SHA-1 instead of the entire file? Then the Shellbox server would verify the file matches the provided SHA-1.

(Yes, I know I complained about using broken hashes and then said to use SHA-1, but presumably this would get fixed when we move away from SHA-1 in the image table.)

Using a known broken hash like MD5 seems wrong in what's supposed to be a security-sensitive application. Since we are already calculating the SHA-1 hash for img_sha1, can we reuse that and only sign the SHA-1 instead of the entire file? Then the Shellbox server would verify the file matches the provided SHA-1.

(Yes, I know I complained about using broken hashes and then said to use SHA-1, but presumably this would get fixed when we move away from SHA-1 in the image table.)

I like your solution, but I will still argue that I didn't say MD5 lightly. MD5 is surely "broken", meaning it's possible to cause collisions, which means it's easy to solve the problem of guessing a text that gives the same cryptographic hash as yours.

But given in reality I was proposing to do something like:

signature = md5sum( secret + padding + request_body)

the attacker, in order to trick shellbox, would not need to match "signature", but rather to figure out "secret", which isn't how md5 has been broken. So I think it would be an actually valid way to assess the integrity of the request. Let me also note that depending on the length of the secret key, it is probably possible to brute-force guess it, given enough computational power, and yes using a faster hashing algorithm will reduce the computational cost of doing so.

Now, I wasn't aware we already had a sha1 of the file, so I would imagine that we could do hmac(secret, request - binary + sha1(binary)) and it would still be valid and quite more efficient. Of course that will mean that someone who intercepts a request can submit any alternative image which has the same sha1 signature...

But given in reality I was proposing to do something like:

signature = md5sum( secret + padding + request_body)

the attacker, in order to trick shellbox, would not need to match "signature", but rather to figure out "secret", which isn't how md5 has been broken. So I think it would be an actually valid way to assess the integrity of the request. Let me also note that depending on the length of the secret key, it is probably possible to brute-force guess it, given enough computational power, and yes using a faster hashing algorithm will reduce the computational cost of doing so.

It's possible to append to request_body and calculate a new signature without knowing the secret, you only need the previous signature, see https://en.wikipedia.org/wiki/Length_extension_attack . That's why the official HMAC algorithm is md5sum( secret + md5sum( secret + request_body ) ), it obscures the vulnerable hash output from the attacker. It doesn't change your point, but I thought it was worth mentioning.

Change 754578 merged by jenkins-bot:

[mediawiki/extensions/PagedTiffHandler@master] Use getSizeAndMetadata() and general style/maintenance updates

https://gerrit.wikimedia.org/r/754578

To further clarify how I conduct the no-timeout tests:

  • sudo puppet agent --disable "T292322 --$USER"

/etc/envoy/envoy.yaml:

  • Change the timeout for the local_port_80 listener in envoy to 600s
  • Change the access log status filter for shellbox-media to >= 200, instead of >= 500

For php/apache we can use the following small script (to be run as root):

#!/bin/bash
pushd /etc/php/7.2/fpm
perl -i"" -pe 's/^(max_execution_time) = (\d+)$/$1 = 600/' php.ini`
perl -i"" -pe 's/^(request_terminate_timeout) = (\d+)$/$1 = 600/' pool.d/www.conf
popd
perl -i"" -pe 's/^(Timeout) \d+$/$1 600/' /etc/apache2/apache2.conf

Finally, go to /srv/mediawiki/wmf-config/CommonSettings.php and find the part where $wgRequestTimeLimit based on the http host. Add a line at the end of that block saying
$wgRequestTimeLimit = 600

At this point, restart php, apache and envoy.

So apparently my small fixes to PagedTiffHandler shaved off about 100 seconds off of the shellbox-based request, see:

https://performance.wikimedia.org/xhgui/run/view?id=61f8e59ac77d008fdbbbcc0a

This is still not enough though. Also, somehow we're still making what looks like 2 calls to Shellbox from retrievemetadata, for a total of more than 80 seconds spent there.

Of these 80 seconds:

  • 55 were spent actually making the requests
  • 25 were spent preparing the request
  • Of the 55 seconds it took making the requests, about 51 were not spent by the backend elaborating the response, but rather by receiving the request with its 1.8 gb file attached. I'm going to drill through it to see if we might be able to improve on that.

I think this is still way too slow compared to the traditional request time.

I'll update this task later with my findings on the shellbox side once I have them, but I think we're trying to optimize a flawed mechanism - I don't think uploading files larger than a few 100 MBs should be a synchronous process.
The correct fix for this problem, and many others for large files uploaded via upload-via-url, is to make it an acynchronous process, have the jobqueue process the upload and have the upload page poll a backend for completion of the upload.

@tstarling I'm open to ideas on how to optimize the current flow though :)

I think this is still way too slow compared to the traditional request time.

I'll update this task later with my findings on the shellbox side once I have them, but I think we're trying to optimize a flawed mechanism - I don't think uploading files larger than a few 100 MBs should be a synchronous process.
The correct fix for this problem, and many others for large files uploaded via upload-via-url, is to make it an acynchronous process, have the jobqueue process the upload and have the upload page poll a backend for completion of the upload.

Agreed, that's T295007: Upload by URL should use the job queue, possibly chunked with range requests.

So after more rounds of tests:

  • Special:UploadWizard still works just fine, given it's an async process it's completely unfazed by the transition to shellbox. It still takes about 1 hour to fully upload a 2 gb file, but that's not significantly longer than without shellbox involved
  • upload-by-url didn't work for very large files because it times out already, we're just pushing the limit a bit lower. I don't know if this is acceptable though. The correct solution for this is anyways T295007 (thanks @Legoktm)
  • I'll dig a bit more in the performance of shellbox itself as I don't understand why it's taking so long in envoy compared to the time it takes on its backend, at least looking at the envoy numbers. It would look like we are filling some other buffer somewhere, not necessarily in envoy.

After more digging: I have no idea why envoy would report the upstream time spent as 2 seconds, when it really is 20. Looks like a bug there. So: most of the time in a request is actually spent processing it in shellbox as I expected.

Two things we can try to improve:

  1. give shellbox-media a bit more CPU per pod, and make the pods QoS instead of burstable
  2. make shellbox use an host emptyDir as its temporary file storage (I need to check the code, but I guess that means /tmp as usual for php applications) so that we avoid the penalty of writing a large file to the docker filesystem with copy on write layer and all its indirections.

Change 762768 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] shellbox-media: use local disk for /tmp

https://gerrit.wikimedia.org/r/762768

After more digging: I have no idea why envoy would report the upstream time spent as 2 seconds, when it really is 20. Looks like a bug there. So: most of the time in a request is actually spent processing it in shellbox as I expected.

Two things we can try to improve:

  1. give shellbox-media a bit more CPU per pod, and make the pods QoS instead of burstable

+1 on the more CPU, but as far as QoS, something's missing from the sentence. Do you want QoS Guaranteed or QoS best-effort ?

  1. make shellbox use an host emptyDir as its temporary file storage (I need to check the code, but I guess that means /tmp as usual for php applications) so that we avoid the penalty of writing a large file to the docker filesystem with copy on write layer and all its indirections.

That might work, it's one of the caveats of write-heavy apps on devicemapper (or any storage driver for that matter). I 'd be interested to look at numbers pre and post changing this.

Change 762768 merged by jenkins-bot:

[operations/deployment-charts@master] shellbox-media: use local disk for /tmp

https://gerrit.wikimedia.org/r/762768

After more digging: I have no idea why envoy would report the upstream time spent as 2 seconds, when it really is 20. Looks like a bug there. So: most of the time in a request is actually spent processing it in shellbox as I expected.

Two things we can try to improve:

  1. give shellbox-media a bit more CPU per pod, and make the pods QoS instead of burstable

+1 on the more CPU, but as far as QoS, something's missing from the sentence. Do you want QoS Guaranteed or QoS best-effort ?

D'oh, I forgot to add "guaranteed" of course.

  1. make shellbox use an host emptyDir as its temporary file storage (I need to check the code, but I guess that means /tmp as usual for php applications) so that we avoid the penalty of writing a large file to the docker filesystem with copy on write layer and all its indirections.

That might work, it's one of the caveats of write-heavy apps on devicemapper (or any storage driver for that matter). I 'd be interested to look at numbers pre and post changing this.

Testing in a few minutes, actually :)

As I feared, no significant change is seen when using an host-mounted emptyDir in the container. I would assume the shellbox server spends most of its time running the hmac verification of the request, like the client does. We might want to try to convert shellbox to use the sha1 of the file to upload when computing hmac instead of the file contents.

@tstarling @Legoktm do you think we can enable this on commons as well? The only negative effect will be to push the max size of a file we can upload-by-url down from ~ 2 GB to (I guess) ~ 1.3 GB or so, basically making an already broken-by-design system a bit more broken. Else, we might need to work on optimizing shellbox instead.

@tstarling @Legoktm do you think we can enable this on commons as well? The only negative effect will be to push the max size of a file we can upload-by-url down from ~ 2 GB to (I guess) ~ 1.3 GB or so, basically making an already broken-by-design system a bit more broken. Else, we might need to work on optimizing shellbox instead.

Do we have an idea of how many uploads that would affect? I didn't see an easy way to distinguish upload-by-url uploads so I wrote/submitted https://gerrit.wikimedia.org/r/c/mediawiki/core/+/765681/. But I think we just bite the bullet and do it, the technical reasons for security are pretty strong, and unless someone has time to prioritize fixing upload-by-url I don't think it's worth waiting indefinitely for this.

@tstarling @Legoktm do you think we can enable this on commons as well? The only negative effect will be to push the max size of a file we can upload-by-url down from ~ 2 GB to (I guess) ~ 1.3 GB or so, basically making an already broken-by-design system a bit more broken. Else, we might need to work on optimizing shellbox instead.

Do we have an idea of how many uploads that would affect? I didn't see an easy way to distinguish upload-by-url uploads so I wrote/submitted https://gerrit.wikimedia.org/r/c/mediawiki/core/+/765681/. But I think we just bite the bullet and do it, the technical reasons for security are pretty strong, and unless someone has time to prioritize fixing upload-by-url I don't think it's worth waiting indefinitely for this.

There is apparently no consensus on proceeding across the technology department, so more discussions will be needed in other avenues. I will just leave this task open in case a consensus is reached. I agree with you that the additional security provided by the switch should in itself justify the move, but management across technology thinks the decision needs to be shared further.

Change 766572 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] api: remove monitoring from http endpoint

https://gerrit.wikimedia.org/r/766572

Change 766572 merged by Giuseppe Lavagetto:

[operations/puppet@production] api: remove monitoring from http endpoint

https://gerrit.wikimedia.org/r/766572

Joe changed the task status from Open to Stalled.Mar 2 2022, 7:12 AM
Joe removed Joe as the assignee of this task.

De-assigning from myself as I can't do anything more for this task in its current status. Also reflecting it's effectively stalled waiting for a decision from WMF management / work on the development side.