Page MenuHomePhabricator

RFC: PHP microservice for containerized shell execution
Open, MediumPublic

Description

Problem

  • For security, we need better isolation of external binaries from MediaWiki.
  • If we run MediaWiki itself under Kubernetes, the resulting container should be as small as possible, so it should ideally exclude unnecessary binaries.
  • It's difficult to deploy bleeding-edge versions of external binaries when they necessarily share an OS with MediaWiki.

Proposal

Have a PHP microservice, accessible via HTTP, which takes POSTed inputs, writes them to the container's filesystem as temporary files, runs a shell command, and responds with gathered output files.

The client and server components, as well as abstractions and local execution, shall be in a new composer library called Shellbox.

The key new abstraction is called BoxedCommand. This is essentially a Command with files attached. Executing a BoxedCommand sets up a temporary directory, places input files inside that directory, executes the command, reads output files, and then cleans up the temporary directory. BoxedCommand can be instructed to either return an output file as a string, or to copy/move the file to another location.

A BoxedResult object is returned, which is similar to the existing Result object except that it also encapsulates output files.

A BoxedCommand can be executed remotely, via the client library, or locally. The execution code will be moved out to a separate Executor hierarchy.

class BoxedCommand {
    // ...

    public function setExecutor( BoxedExecutor $executor ) {
        $this->executor = $executor;
    }

    public function execute() {
        return $this->executor->execute( $this );
    }

So for callers in MediaWiki, the interface will look like this:

$result = MediaWikiServices::getInstance()
    ->getCommandFactory()
    ->getBoxedCommand( ... params ... )
    ->execute();

if ( $result->getExitCode() === 0 ) {
    $output = $result->getStdout();
}

The existing Command class is not quite suitable as a parent class of BoxedCommand, because BoxedCommand will be in a composer library whereas Command has b/c considerations and some MediaWiki dependencies. MediaWiki's Command can probably become a subclass of a loosely-coupled UnboxedCommand class within Shellbox.

I considered having the client post the data as multipart/form-data, and then the server can just use $_FILE. The problem with that is that for security, we want to have HMAC authentication of the POST contents. So my current plan is to post the data as multipart/mixed -- an efficient binary format used by email which is essentially identical to multipart/form-data -- and to parse it in PHP. That allows HMAC authentication to be done without having to reconstruct PHP's $_FILE logic.

I previously considered making the service be aware of Swift, but I don't think there is much benefit after T260504 is done.

multipart/mixed is a nice format for file transfer because it is binary-safe. JSON is officially UTF-8, using it to transfer binary data without base-64 encoding is quite dodgy.

Structured parameters within the multipart/mixed request are transferred as a JSON part.

File API

File names within the box are relative paths under a working directory. Running a command with files might look like this:

$result = $commandFactory->getBoxedCommand()
  ->inputFileFromString( 'input1.svg', $inputString )
  ->inputFileFromFile( 'config.json',  __DIR__ . '/config.json' )
  ->outputFileToFile( 'out1.png', TempFSFile::factory( 'out', 'png' )->getPath() )
  ->params( 'transform', "--input=input1.svg", "--conf=config.json", "--output=out1.png" )
  ->execute();
$outputStr = $result->getFileContents( 'out1.png' );

Backwards compatibility

Shell::command() and wfShellExec() will continue to execute commands locally, without using the service, even in WMF production. MediaWiki's shell command handling will be moved to the Shellbox library, which will be required by MediaWiki core as a composer dependency. There will be some refactoring but "unboxed" commands will continue to work roughly as they always have.

The new BoxedCommand abstraction will be available via CommandFactory. BoxedCommand can either execute commands locally (the default) or it can execute commands remotely using the Shellbox service.

PHP RPC

It's trivial to add an endpoint to the server which runs arbitrary PHP code. This allows PHP execution to be done with time and memory limits, for example for T240884. There will be shared client library responsible for the protocol. Shell and RPC endpoints will wrap this client library.

Caller survey

  • Media transformation
    • Subclasses get a local copy of a file from swift and transform it, with the output being placed in a local directory for the framework to upload to swift
    • Source download in File::getThumbnailSource() has a PoolCounter to avoid network saturation
    • Thumbor theoretically replaces a lot of this, but it is loosely integrated so these code paths may end up getting hit anyway.
    • BitmapHandler
      • Scaling: runs ImageMagick with environment overrides
      • Also standalone rotation without scaling
      • Also “convert -version” with cache
    • DjVuHandler
      • Uses a pipeline ddjvu | pnmtojpeg
    • SvgHandler
      • rsvg with weird command template config
    • JpegHandler
      • Postprocess with exiftool
      • Rotation with jpegtran
    • PagedTiffHandler
      • ImageMagick
    • TimedMediaHandler
      • ffmpeg
      • fluidsynth
    • PdfHandler
      • gs | convert
    • VipsScaler
      • vips, exiv2
      • Puts a bunch of vips commands into an array, runs them all
      • The commands talk to each other via temporary files
    • 3D
      • xvfb-run 3d2png.js
      • With environment
  • Media metadata
    • Framework provides a local source path, IIRC this is prior to Swift publication so the file is already local.
    • Output goes into the DB
    • DjVuImage
      • Runs djvudump and djvutext on the source file
    • JpegHandler
      • exiftool
    • PagedTiffHandler
      • tiffinfo
    • PdfHandler
      • pdfinfo, pdftotext
  • Video transcode
    • TMH does a pcntl_fork(), then the pcntl_exec() is commented out and replaced by plain old wfShellExec()
    • The parent monitors the output file size and provides progress updates
    • Runs kill, ps directly with exec(), not via MW wrapper
    • ffmpeg, fluidsynth
    • @Joe notes: this case is complex enough to warrant its own separate service
  • Special pages
    • SecurePoll
      • gpg for encryption
    • CodeReview
      • svn
    • OpenStackManager
      • ssh-keygen via proc_open
  • Parser
    • EasyTimeline
      • Perl runs Ploticus
    • SyntaxHighlight_GeSHi
      • Poorly named extension runs pygmentize
    • Score
      • lilypond, gs, convert, fluidsynth
  • ResourceLoaderImage::rasterize()
    • Runs rsvg with proc_open
  • GlobalIdGenerator
    • Runs ifconfig to get the MAC address, with local file cache
  • GitInfo
    • Uses git to inspect the local source tree. For Special:Version etc.
  • Things that run MediaWiki maintenance scripts
    • SiteConfiguration::getConfig()
    • CirrusSearch/includes/Api/SuggestIndex.php
    • FlaggedRevs Special:ValidationStatistics
  • wfMerge()
    • Runs diff3 for conflict merging on page save
    • Uses popen() directly, no firejail
    • Makes temporary input files, reads output from pipe
  • Maintenance scripts
    • Maintenance::getTermSize(): stty
    • Maintenance::readlineEmulation(): readline
    • psysh: which less
    • mysql
    • other maintenance scripts
  • noc
    • scap wikiversions-inuse
  • Some things that are not used in WMF production:
    • $wgAntivirus
    • $wgExternalDiffEngine
    • Installer::envCheckShellLocale()
    • RandomImageGenerator
    • FSFileBackend async mode
      • cp, mv, test, chmod

Caller survey summary

Binaries used in the more tractable cases:

  • convert
  • ddjvu
  • diff3
  • djvudump
  • djvutext
  • exiftool
  • exiv2
  • ffmpeg
  • fluidsynth
  • gpg
  • gs
  • jpegtran
  • lilypond
  • pdfinfo
  • pdftotext
  • perl
  • pnmtojpeg
  • pygmentize
  • rsvg
  • ssh-keygen
  • svn
  • tiffinfo
  • vips
  • xvfb-run

More difficult or not applicable binaries

  • git
  • php
  • ps
  • kill
  • stty
  • readline
  • ipconfig

A day in the life of a typical MediaWiki shell command

  • Where is the binary? Usually it is statically configured but sometimes the PATH is used. If it’s not installed, I raise an error. Binary path configuration may be false or null, and if so, I do something else.
  • I want to know the binary version, so I run the command with --version and cache the result.
  • Here’s an input as a string. I write it to a temporary file.
  • Here’s an input as a File object. I ask it for a local copy.
  • Here’s an input which is a local file already. Easy!
  • My command uses a lot of memory. I override the memory limits.
  • I override some environment variables.
  • My command needs to run with the working directory equal to the directory containing the input and output files. I call chdir().
  • I run a command with some input files.
  • The command generates some output files. I run another command that uses those output files as an input.
  • I don’t know how many output files were generated. I need to find and gather them.
  • One of the input files was modified by the command and now I want its new contents.
  • What is the exit status?
  • Something went wrong -- an output file I expected is not there. Maybe there is something in stderr or stdout that I can show to the user.
  • All done, now I need to clean up the temporary files. Sometimes I know their names, sometimes I search for them.

Performance considerations

  • Startup overhead impacts fast but common commands. We presumably win by forking a smaller process prior to exec() but lose by network stack overhead and container setup.
  • Copying overhead impacts commands that run on large files
  • OS-level containers are large and defeat library sharing. However, due to the way Docker works, the containers for the different routes will be very similar and will use the same base image, so will not use much additional disk space.
  • Consider multipart encoding not ASCII.

Security considerations

  • Can conduct timing attacks. RDTSC is unprivileged. This implies that you need either application-level security (disallow arbitrary machine code) or separate hardware.
  • Linux is buggy and has a wide attack surface. Review the need for /proc, /sys, /dev.
  • Read-only everything, or use an overlay
  • Private /tmp
  • Requests signed with a shared secret or asymmetric encryption

Credits

This proposal is based on @Joe's investigations into running MediaWiki on Kubernetes and originated with a conversation between @Joe and @tstarling.

To do

Work in progress is in the mediawiki/libs/Shellbox project in Gerrit.

Shellbox:

  • Port FirejailCommandIntegrationTest and FirejailCommandTest from MediaWiki to Shellbox.
  • Testing of Windows support
  • Testing of systemd-run and firejail support.
  • Enable PHPUnit tests in CI
  • Parsing of shell commands to assist command validation.
  • Per-route command validation.
  • Decide whether to make Guzzle a hard dependency. If it is a hard dependency then we can provide a default HTTP client instead of just an interface.

MediaWiki integration: now tracked at T267532

Deployment:

  • Routing
  • Container creation
  • ???
  • wmf-config, service discovery etc.
  • Score as a pilot

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • The service will be executed as user nobody

I would suggest using a new system user rather than nobody. The problem with nobody is that as soon as two different things on a system use it, they can attack each other. Any file owned by nobody can be written to by any service running as nobody.

  • The service will be firewalled from all network access. We might consider adding specific exceptions, but in general I'm against it

Presumably you have to at least allow inbound HTTP connections. That's a slight regression in attack surface compared to firejail, since under firejail, commands have a network namespace with only localhost, whereas this container will presumably at least need a host link.

  • Limits on execution (in terms of memory and cpu time) for each external program will be determined in configuration and enforced by kubernetes

So when the service is running in production, we want it to just set time limits for its shellouts, any attempt to work with cgroups or seccomp would fail - that needs to be done externally.

The interface I'm proposing allows the caller to specify time and memory limits. Memory limits set this way will presumably be ignored. But maybe the service could do its own setrlimit() to respect the time limits set by the caller, in addition to a (slightly higher) limit on the container. It would be nice to be able to give backwards compatible error messages when a timeout is reached. Presumably, if the container is killed, we will just see a generic HTTP 500 error.

Do we need any log messages from inside the container? For example, if the service gets an error while deleting a temporary file, should it tell us about that? My plan for log messages was to buffer them and return them to the caller, but that means that if the container is killed, we lose the messages.

Will there be a new container created for each command? Or will containers be persistent and service multiple commands?

If there will be a per-command container, then we have to think about startup time. For example, I read at https://gvisor.dev/docs/architecture_guide/performance/ that gvisor has a startup time on the order of 1s. The latency can be hidden by starting containers in advance, but it still impacts the throughput.

php-fpm has a startup time of about 80ms when I measure it locally. Parse time for service startup should be less than that, especially if we avoid large dependencies like REST frameworks.

If containers are persistent and serve multiple requests, then we should consider whether a malicious command could become persistent within the container, attacking subsequent commands. That's not possible with firejail because it shuts down the per-command container with kill(-1, SIGKILL).

As mentioned above, we'd like to have separate containers for the various shellouts. This means each service will only need to respond to requests for a specific shellout. The php microservice should thus be configurable to wire its routing as follows:

  • /healthz a probe to know if the service is available/ready
  • / some form of application manifest? An OpenAPI spec?
  • /exec (or a less terrible name) the endpoint for executing the shellout.

This is an RPC service, not a REST service. Parameters can be in query strings. REST is a lot of hassle in PHP, so I don't really want to do it if there's no need for that apart from a desire to be fashionable.

I would love for the microservice to be configurable in terms of: which executable to launch within /exec, and what parameters, or parameters mask, to accept, what timeout to establish.

Such configuration should preferably not be specified as php code, but in a docker-friendly way - so possibly as environment variables.

We could have configuration that validates the parameters specified by the caller. Configuration actually in an environment variable sounds a bit weird. Maybe an environment variable could specify the path to a JSON file. So you could have a collection of JSON profiles bind-mounted into the container, and select which one you want using an environment variable.

I'm reconsidering the layout abstraction I had planned, with inputs/ and outputs/ subdirectories under the working directory. It's simpler to allow slashes in filenames and to allow files to go anywhere under the working directory. So you could have

$result = $commandFactory->newBoxedCommand()
  ->params( 'convert', 'input.svg', 'output.png' )
  ->addInputFileFromString( 'input.svg', $svgContents )
  ->addOutputFileToString( 'output.png' )
  ->execute();

$pngContents = $result->getFileContents( 'output.png' );

This is with string literal filenames as originally proposed, before Daniel's objections led to my last couple of task description edits. But string literals are not so bad if they are all filenames in the same namespace. If you don't like the repeated literals, you can just do

$input = 'input.svg';
$output = 'output.svg';

$result = $commandFactory->newBoxedCommand()
  ->params( 'convert', $input, $output )
  ->addInputFileFromString( $input, $svgContents )
  ->addOutputFileToString( $output )
  ->execute();

$pngContents = $result->getFileContents( $output );

I don't know if we really gain much from object encapsulation of files, and it tends to break the fluent style and it makes it more difficult to figure out what the shell command will be.

One reason for avoiding inputs/ and outputs/ is that LilyPond, which is supposed to be the initial use case, puts output files into the current directory by default. It's easier to port it if we can just deal with that convention.

I don't know if we really gain much from object encapsulation of files, and it tends to break the fluent style and it makes it more difficult to figure out what the shell command will be.

I guess you are right. My proposal to model input and output files as objects was really just meant as an idea to explore, not an objection to your design.

OK, I'm adding PHP execution to the service.

Am I correct to assume that the PHP execution mode will just be a shorthand for something like php -r 'echo "Hello";'?

Perhaps like this?

$result = $commandFactory->newBoxedCommand()
  ->php( 'echo "Hello";' )
  ->execute();

Or did you think of doing something along the lines of

$result = $commandFactory->newBoxedCommand()
  ->setPhpFile( __DIR__ . '/Hello.php' )
  ->params( '--hello World' )
  ->execute();
  • The service will be firewalled from all network access. We might consider adding specific exceptions, but in general I'm against it

Presumably you have to at least allow inbound HTTP connections. That's a slight regression in attack surface compared to firejail, since under firejail, commands have a network namespace with only localhost, whereas this container will presumably at least need a host link.

Yes, I meant we will disallow any outbound network request, and only open inbound HTTP connections from the gateway, that will also be running in kubernetes.

  • Limits on execution (in terms of memory and cpu time) for each external program will be determined in configuration and enforced by kubernetes

So when the service is running in production, we want it to just set time limits for its shellouts, any attempt to work with cgroups or seccomp would fail - that needs to be done externally.

The interface I'm proposing allows the caller to specify time and memory limits. Memory limits set this way will presumably be ignored. But maybe the service could do its own setrlimit() to respect the time limits set by the caller, in addition to a (slightly higher) limit on the container. It would be nice to be able to give backwards compatible error messages when a timeout is reached. Presumably, if the container is killed, we will just see a generic HTTP 500 error.

Time limits will still be managed by the application, probably with an upper bound set by the gateway.

Do we need any log messages from inside the container? For example, if the service gets an error while deleting a temporary file, should it tell us about that? My plan for log messages was to buffer them and return them to the caller, but that means that if the container is killed, we lose the messages.

Will there be a new container created for each command? Or will containers be persistent and service multiple commands?

If there will be a per-command container, then we have to think about startup time. For example, I read at https://gvisor.dev/docs/architecture_guide/performance/ that gvisor has a startup time on the order of 1s. The latency can be hidden by starting containers in advance, but it still impacts the throughput.

php-fpm has a startup time of about 80ms when I measure it locally. Parse time for service startup should be less than that, especially if we avoid large dependencies like REST frameworks.

My current idea is that we will run these "nanoservices" as normal kubernetes services, so the containers will run until they get killed for an external reason, and be restarted if that happens. The reason for this is exactly that the startup time is too large (you also have to add the work of the kubernetes scheduler). The issue with that is of course that

If containers are persistent and serve multiple requests, then we should consider whether a malicious command could become persistent within the container, attacking subsequent commands. That's not possible with firejail because it shuts down the per-command container with kill(-1, SIGKILL).

This holds true. I'm still unsure how to balance it besides "the container is an empty shell anyways, which will hold no credentials nor any sensible data".

As mentioned above, we'd like to have separate containers for the various shellouts. This means each service will only need to respond to requests for a specific shellout. The php microservice should thus be configurable to wire its routing as follows:

  • /healthz a probe to know if the service is available/ready
  • / some form of application manifest? An OpenAPI spec?
  • /exec (or a less terrible name) the endpoint for executing the shellout.

This is an RPC service, not a REST service. Parameters can be in query strings. REST is a lot of hassle in PHP, so I don't really want to do it if there's no need for that apart from a desire to be fashionable.

That's just how our services are organized in production right now. The only think I'd like to hold is keeping the health check url at /healthz as that's a de-facto standard.

I would love for the microservice to be configurable in terms of: which executable to launch within /exec, and what parameters, or parameters mask, to accept, what timeout to establish.

Such configuration should preferably not be specified as php code, but in a docker-friendly way - so possibly as environment variables.

We could have configuration that validates the parameters specified by the caller. Configuration actually in an environment variable sounds a bit weird. Maybe an environment variable could specify the path to a JSON file. So you could have a collection of JSON profiles bind-mounted into the container, and select which one you want using an environment variable.

that is what I was thinking of, yes.

OK, I'm adding PHP execution to the service.

Am I correct to assume that the PHP execution mode will just be a shorthand for something like php -r 'echo "Hello";'?

No, I refactored the server so that it has actions, and the actions are called "shell" which runs a shell command, and "call" which calls a PHP function with specified parameters. They share a TemporaryDirectoryManager and input file handling. The "call" action also requires a list of PHP input files, so that's how you define the function you're calling. I preferred this over an eval() endpoint since I didn't like the idea of encoding function call parameters as PHP code. This way, they are encoded with JSON. On the client side, I imagine you will be able to specify a class name, which will be added to the file list to transfer using ReflectionClass::getFileName().

The output formats are broadly similar, but with "shell" there are output files and JSON, whereas with "call" there is just JSON, being the serialized return value.

I don't think it still makes sense for it to all be under BoxedCommand. It's now two things: remote shell and PHP RPC. Maybe something like

$rpcService = MediaWikiServices::getInstance()->getPhpRpc();
$result = $rpcService->newCallBuilder()
  ->classes( MyRpc::class )
  ->function( [ MyRpc::class, 'execute' ] )
  ->params( 1, 2, 3 )
  ->execute();

Where $result is just the return value of MyRpc::execute(1, 2, 3).

Underlying the BoxedCommand and PHP RPC abstractions there would be a shared client, perhaps still called Shellbox.

If the name is definitely OK then I can make a repository for it and upload a WIP patch. Currently in my Shellbox directory I have a plausible-looking server with input/output file handling and HMAC authentication. The "call" endpoint is basically complete (with 25 lines of code) whereas the "shell" endpoint is a stub. In a local MediaWiki branch, I have drafts of the BoxedCommand interfaces and some refactoring for the Command/Executor split. Next I was going to write a client utility which will generate the HMAC-signed request body. Then I would stop procrastinating on Command/Executor and finish that off.

The "call" action also requires a list of PHP input files, so that's how you define the function you're calling.

So the PHP code is send along with the command every time the command is executed? That seems wasteful, but changing that would change the paradigm, or require more complex interactions between the services... probably best left as an optimization for later.

"Shellbox" seems fine as a name.

My current idea is that we will run these "nanoservices" as normal kubernetes services, so the containers will run until they get killed for an external reason, and be restarted if that happens. The reason for this is exactly that the startup time is too large (you also have to add the work of the kubernetes scheduler). The issue with that is of course that

If containers are persistent and serve multiple requests, then we should consider whether a malicious command could become persistent within the container, attacking subsequent commands. That's not possible with firejail because it shuts down the per-command container with kill(-1, SIGKILL).

This holds true. I'm still unsure how to balance it besides "the container is an empty shell anyways, which will hold no credentials nor any sensible data".

Depending on the request volume (and the number of replicas) it might also be possible to kill the container (not the pod) after every request (via a livenessProbe?). If the Pod does not die, the container restart should be pretty fast as no rescheduling is done.

Has anyone got an idea for giving the HMAC key to the server without allowing the command to have access to it? Otherwise an attacker can use a command to exfiltrate the key and then spoof requests. If it's not possible, maybe we should think about asymmetric encryption.

Has anyone got an idea for giving the HMAC key to the server without allowing the command to have access to it? Otherwise an attacker can use a command to exfiltrate the key and then spoof requests. If it's not possible, maybe we should think about asymmetric encryption.

The only idea I have is to have a separate service for validating the HMAC. But that doubles the overhead...

Has anyone got an idea for giving the HMAC key to the server without allowing the command to have access to it? Otherwise an attacker can use a command to exfiltrate the key and then spoof requests. If it's not possible, maybe we should think about asymmetric encryption.

The simplest way I can think of making that possible would be to fetch the HMAC on startup from a file only readable by php-fpm, then have another user php-fpm uses to execute the binary, that can't access the file.

This of course only works if you're executing an external program, not if you're evaluating php code. The alternative is to use some other mean of authn - maybe TLS client certs? but that complicates the setup a lot.

tstarling renamed this task from PHP microservice for containerized shell execution to RFC: PHP microservice for containerized shell execution.Aug 27 2020, 12:56 AM
tstarling added a project: TechCom-RFC.
tstarling updated the task description. (Show Details)

Task description edit:

  • Changed the file API again as discussed
  • Stopped describing BoxedCommand as a subclass of Command and explained why that doesn't quite work.
  • Proposed passing $this to Executor since passing each variable as a separate formal parameter is tedious and harder to maintain. We can just have accessors in BoxedCommand.
  • Mentioned the PHP call endpoint.

Change 622707 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/libs/Shellbox@master] [WIP] Shellbox

https://gerrit.wikimedia.org/r/622707

One outstanding question is what to do about the restrictions bitfield. In production, firejail will be disabled and restrictions will be ignored. But in other situations, presumably some effort should be made to respect them. Obviously NO_LOCALSETTINGS doesn't make sense, so the new wrapper in MediaWiki will respond to that by adding LocalSettings.php to a blacklist array.

In the WIP patch, restriction wrappers can be plugged in, so I used strings instead of bits, since I figured that would be more easily extensible. I kept the boolean semantics though, and I'm having second thoughts about that. Should restrictions just be replaced by an associative array of wrapper options?

There are some nice things about bits that maybe we don't want to lose. There's some validation due to the fact that constants are always used. Another alternative to using an associative array is to just have getters and setters for everything in Command. I've already taken this approach in the WIP patch to unpack limits() into separate fields. There are only 5 restrictions currently defined in MediaWiki\Shell\Shell, so converting them to getters and setters is quite feasible. But maybe it makes the calling code a little longer.

The lack of a stability guarantee in RESTRICT_DEFAULT is problematic for a library, which is why the library default in the WIP patch is to apply no restrictions. MediaWiki can apply default restrictions in its factory.

Note that Shellbox depends on monolog and core will depend on shellbox, so this will make core indirectly depend on monolog. In core at the moment, monolog is optional (although it's a dev dependency). I think loose integration between core's LoggerFactory and monolog is the source of a lot of nuisances, so I'm happy to add it as a core dependency, as part of a plan to more tightly integrate core's logging with monolog in future. Monolog is fairly large (6 kloc) but doesn't have any secondary dependencies.

Regarding restrictions again. I'm not a big fan of Firejail after my recent code review and bug reports, so I'm looking at the restriction options with a view to making them independent of the sandboxing system. Shell::SECCOMP is awkward in this respect since it disables a firejail-specific set of syscalls. It also also implies no_new_privs, disabling setuid-root executables, which seems like it should be a separate option.

Several callers do leave out Shell::SECCOMP, for example with restrict(Shell::RESTRICT_NONE). I guess for b/c the simplest thing is to have Shell::SECCOMP map to more specific options:

public function restrict( $restrictions ) {
    if ( $restrictions & Shell::SECCOMP ) {
        $this->firejailDefaultSeccomp( true );
        $this->noNewPrivs( true );
    } else {
        $this->firejailDefaultSeccomp( false );
        $this->noNewPrivs( false );
    }
    ...
}

This is assuming I follow my previous idea of splitting out the restrictions bits into boolean options with their own mutators. Here Shell::SECCOMP implies noNewPrivs() so that if a wrapper supports noNewPrivs() but not firejailDefaultSeccomp(), Shell::SECCOMP will activate noNewPrivs().

Change 626548 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] [WIP] Use Shellbox for Shell::command() etc.

https://gerrit.wikimedia.org/r/626548

Change 626925 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Add BoxedCommand abstraction and Shellbox client

https://gerrit.wikimedia.org/r/626925

An open question is what to do about shell pipelines. Currently if you do Shell::command()->unsafeParams('foo|bar') then foo will run under firejail and bar will run unsandboxed. Maybe that's accidental? We could do firejail sh -c "foo|bar" but that doesn't work with NO_EXEC. We could parse the command line and disallow pipelines and lists in BoxedCommand, but that would probably lead to non-portable code as people set up their own shell wrappers. Or we could parse the command line and run each component separately under firejail, but that's the most work, and I'm not sure if anything really needs it. Or we could leave it as it is.

Probably some sort of command parsing will be needed to implement validation. Server-side command validation is an unimplemented requirement.

Change 627635 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/libs/Shellbox@master] Library for containerized shell execution (initial commit)

https://gerrit.wikimedia.org/r/627635

This RFC is up for public discussion today at 21:00 UTC (23:00 CEST, 2pm PDT).
The discussion is taking place on IRC, in the #wikimedia-office channel on freenode.

First of all, cool stuff!

Second, I noticed the following:

I previously considered making the service be aware of Swift, but I don't think there is much benefit after T260504 is done.

I may misunderstand, but would Swift-awareness still be useful for things which are not images? In other words, will the remote caller be responsible for taking the result of a processed command and then persisting it into Swift if it's non-image data if that's what it ultimately wants to do?

I don't have a strong position there and can see reasons why and why not to build logic and firewall rules in as such, was more curious, as it's a potentially common case - the common case could just as easily exist in some sort of orchestrator component, though.

Third, might it be possible to run a daemon on a spawned box for things that benefit from being warmed up? Or are the commands always supposed to be run cold (even if the box speculatively warmed up)?

Thanks @tstarling et al for the hard work, effort, and updates.

Has anyone got an idea for giving the HMAC key to the server without allowing the command to have access to it? Otherwise an attacker can use a command to exfiltrate the key and then spoof requests. If it's not possible, maybe we should think about asymmetric encryption.

The kernel keyring can have a key loaded onto it (via keyctl) which is usable for crypto operations (via AF_ALG sockets) but the key itself cannot be accessed by userspace.

The kernel keyring can have a key loaded onto it (via keyctl) which is usable for crypto operations (via AF_ALG sockets) but the key itself cannot be accessed by userspace.

Can this be used from inside a docker container?

Can this be used from inside a docker container?

I've used it with LXC containers (each container having its own keyring namespace), so it should also be possible with Docker.

An open question is what to do about shell pipelines.

I didn't see any shell pipelines in your caller survey and can't think of any off hand - is this a usecase we need to support?

Currently if you do Shell::command()->unsafeParams('foo|bar') then foo will run under firejail and bar will run unsandboxed. Maybe that's accidental? We could do firejail sh -c "foo|bar" but that doesn't work with NO_EXEC. We could parse the command line and disallow pipelines and lists in BoxedCommand, but that would probably lead to non-portable code as people set up their own shell wrappers. Or we could parse the command line and run each component separately under firejail, but that's the most work, and I'm not sure if anything really needs it. Or we could leave it as it is.

Yeah, that's accidental. I'd lean toward disallowing pipelines and require them to be implemented in PHP itself, though that might be problematic if whatever intermediate output is very large and can't be buffered like a real pipeline would.

I didn't see any shell pipelines in your caller survey and can't think of any off hand - is this a usecase we need to support?

I mentioned in the survey that DjvuHandler has a pipeline. It's a case that really benefits from it -- an enormous uncompressed image being compressed to JPEG before storing it to Swift.

Score has a single input and typically runs 5 separate shell commands to generate various outputs from that input. One of the intermediate files (a .wav) is potentially large. So a remote script file would be ideal for Score.

I've been thinking about how to make this cross-platform. Microsoft's Azure documentation has some pretty blunt advice on cross-platform scripting: "just use bash". We could potentially follow this advice and require bash on Windows.

Change 627635 merged by Ppchelko:
[mediawiki/libs/Shellbox@master] Library for containerized shell execution (initial commit)

https://gerrit.wikimedia.org/r/627635

Change 630016 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/extensions/Score@master] [WIP] Migrate to BoxedCommand

https://gerrit.wikimedia.org/r/630016

Change 630017 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/extensions/Score@master] [WIP] Scripted score rendering

https://gerrit.wikimedia.org/r/630017

I uploaded the Score changes to give you an idea of what a moderately complex caller looks like in practice. It's lightly tested but seems to work for me.

Change 626925 abandoned by Tim Starling:
[mediawiki/core@master] Add BoxedCommand abstraction and Shellbox client

Reason:
I squashed it with the parent since I kept conflicting with myself.

https://gerrit.wikimedia.org/r/626925

Change 622707 merged by jenkins-bot:
[mediawiki/libs/Shellbox@master] Shellbox initial source files

https://gerrit.wikimedia.org/r/622707

Krinkle added a subscriber: Krinkle.

Put on Last Call until 2 December.

Task description edit: added "backwards compatibility" section explaining how this will work for default installations.

@tstarling, I was wondering if you could please weigh in on my questions above?

In the context of Wikifunctions one thought with needing to support multiple programming languages is that of fanning out sandboxed code execution, at least in part, with this sort of tech. There are alternative open source options that seem to support many of our needs, but there's the obvious opportunity cost of needing to keep those things up to date, too.

I may misunderstand, but would Swift-awareness still be useful for things which are not images? In other words, will the remote caller be responsible for taking the result of a processed command and then persisting it into Swift if it's non-image data if that's what it ultimately wants to do?

I don't have a strong position there and can see reasons why and why not to build logic and firewall rules in as such, was more curious, as it's a potentially common case - the common case could just as easily exist in some sort of orchestrator component, though.

Sure, it's just that I can only find one thing that takes the result of a command and puts it into Swift, and that is Score. Maybe Score should be migrated to do it the same way as everything else, rather than adding more things that work like Score. Score blocks wikitext parsing while command execution and Swift upload are in progress, which is not a great architecture. If Shellbox can upload directly to Swift, then that makes it privileged, partly defeating the purpose of having it. The current plan is that Swift credentials will not be accessible to sandboxed code.

Third, might it be possible to run a daemon on a spawned box for things that benefit from being warmed up? Or are the commands always supposed to be run cold (even if the box speculatively warmed up)?

I don't understand what you mean by this. Commands don't warm up. We ideally want commands to be fully isolated from each other so that they can't attack each other. It sounds like you want a service runner, rather than a command runner.

In the context of Wikifunctions one thought with needing to support multiple programming languages is that of fanning out sandboxed code execution, at least in part, with this sort of tech. There are alternative open source options that seem to support many of our needs, but there's the obvious opportunity cost of needing to keep those things up to date, too.

It would be pretty easy to add an async execute function which returns a promise instead of a plain result, if that's what you need. The Guzzle client has an async entry point which returns a promise, which can be extended by chaining. The implementation is not great -- I think you would hit performance/robustness problems if you had 100 requests in flight. For a few requests it should probably work.

I may misunderstand, but would Swift-awareness still be useful for things which are not images? In other words, will the remote caller be responsible for taking the result of a processed command and then persisting it into Swift if it's non-image data if that's what it ultimately wants to do?

I don't have a strong position there and can see reasons why and why not to build logic and firewall rules in as such, was more curious, as it's a potentially common case - the common case could just as easily exist in some sort of orchestrator component, though.

Sure, it's just that I can only find one thing that takes the result of a command and puts it into Swift, and that is Score. Maybe Score should be migrated to do it the same way as everything else, rather than adding more things that work like Score. Score blocks wikitext parsing while command execution and Swift upload are in progress, which is not a great architecture. If Shellbox can upload directly to Swift, then that makes it privileged, partly defeating the purpose of having it. The current plan is that Swift credentials will not be accessible to sandboxed code.

I would be against having this service have side effects (and privileged access to swift).

This service is more secure if it has access to no secret whatsoever, and also this way we avoid overstretching its goals.

Thanks @tstarling and @Joe. It sounds like this isn't the right fit for Wikifunctions. I appreciate the advice!

Change 630016 abandoned by Tim Starling:
[mediawiki/extensions/Score@master] [WIP] Migrate to BoxedCommand

Reason:
squashed

https://gerrit.wikimedia.org/r/630016

For the extensions https://www.mediawiki.org/wiki/Extension:Diagrams and https://www.mediawiki.org/wiki/Extension:Piwo the approach discussed here might be very useful.

In the proposal it says:
Have a PHP microservice, accessible via HTTP, which takes POSTed inputs, writes them to the container's filesystem as temporary files, runs a shell command, and responds with gathered output files.

Why would this have to be a PHP microservice? This would be a good opportunity to open up MediaWiki to all kinds of services written in different programming languages and avoid the dreadful dependencies on PHP and Lua that lock out so many options available out there. I think a good security model for such services is the main aspect to be offered by the service infrastructure. Whether the service will call binaries somewhere or not is IMHO just a side discussion.

The service is mainly for executing shell commands. Right now that happens through wfShellExec. Typically to invoke programs such as git, imagemagic, ffmpeg, as well as in the case of Syntaxhighlight to invoke a Python program.

We have effectively taken the PHP function wfShellExec and turned it into a service so that execution can happen on another server. What language this middleware service is written it doesn't really matter as it is transparent to any program or binary run through it. The service is minimal and with no external dependencies, PECL packages, or Lua etc and should be trivial to deploy as such, including from a security perspective.

As being a modern and optional service connected through service wiring in MW, if you did want to have the middleware facilitated by a different language for some reason, one could replace that through an extension so long as it implements the same interface.

Put on Last Call until 2 December.

This RFC has been approved and is now closed.

Can the task stay open to track implementation? The RFC workboard has "Approved" and "Implemented" columns so I figured it would progress through those while still being open.

Given the title and task description, I assumed it was a dedicated task, but I see it's used as tracking task indeed. Sorry about that.

Change 649651 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[mediawiki/libs/Shellbox@master] Add basic pipeline configuration

https://gerrit.wikimedia.org/r/649651

Change 649651 merged by jenkins-bot:
[mediawiki/libs/Shellbox@master] Add basic pipeline configuration

https://gerrit.wikimedia.org/r/649651

Change 661239 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/vendor@master] Add wikimedia/shellbox

https://gerrit.wikimedia.org/r/661239

Change 661239 merged by jenkins-bot:
[mediawiki/vendor@master] Add wikimedia/shellbox

https://gerrit.wikimedia.org/r/661239

Change 626548 merged by jenkins-bot:
[mediawiki/core@master] Use Shellbox for Shell::command() etc.

https://gerrit.wikimedia.org/r/626548

Change 630017 merged by jenkins-bot:
[mediawiki/extensions/Score@master] Migrate to BoxedCommand/Shellbox

https://gerrit.wikimedia.org/r/630017