Context
In 2019, as part of a routine extension review (MachineVision: T227346), the standard MW idiom for maintenance scripts (the file header) was flagged as a "low risk" security issue by a static analysis tool (which?):
$IP = getenv( 'MW_INSTALL_PATH' ); if ( $IP === false ) { $IP = __DIR__ . '/../../..'; } require_once "$IP/maintenance/Maintenance.php"; class MigrateStuffExample extends Maintenance { public function execute() { // Actual script here // ... // ... } } $maintClass = MigrateStuffExample::class; require_once RUN_MAINTENANCE_IF_MAIN;
While we commonly ignore these tools when the specific context makes the concern a non-issue, we suggested it is probably worthwhile to fortify in this case (ref T227346#5627941). There were a number of iterations (MachineVision: one, two, three; DT: four, five), but we seem to have settled on the following emerged pattern that we now see in a few places:
// Security: Disable all stream wrappers and reenable individually as needed foreach ( stream_get_wrappers() as $wrapper ) { stream_wrapper_unregister( $wrapper ); } // Needed by the Guzzle library for some reason stream_wrapper_restore( 'php' ); // Needed by ForeignResourceManager to unpack TAR files stream_wrapper_restore( 'phar' ); stream_wrapper_restore( 'file' ); $basePath = getenv( 'MW_INSTALL_PATH' ); if ( $basePath ) { if ( !is_dir( $basePath ) || strpos( $basePath, '.' ) !== false || strpos( $basePath, '~' ) !== false ) { die( "Bad MediaWiki install path: $basePath\n" ); } } else { $basePath = __DIR__ . '/../../..'; } require_once "$basePath/maintenance/Maintenance.php"; class MigrateStuffExample extends Maintenance { public function execute() { // Actual script here // ... } } $maintClass = MigrateStuffExample::class; $doMaintenancePath = RUN_MAINTENANCE_IF_MAIN; if ( !( file_exists( $doMaintenancePath ) && realpath( $doMaintenancePath ) === realpath( "$basePath/maintenance/doMaintenance.php" ) ) ) { die( "Bad maintenance script location: $basePath\n" ); } require_once RUN_MAINTENANCE_IF_MAIN;
There is a benefit to reviewing code without the bias of MW familiarity (whether by tooling or by hand) as this avoids leaving a common pattern ignored or unquestioned. However, I think this particular issue could use a review of the risk and its proposed mitigation within its appropiate context. In this case, the context of modern PHP and MediaWiki development and production environments like WMF. (In this case, I imagine it also played a role that WMF has at times grown through teams created "in a vacuum" that lack sufficient familiarity to make an informed choice.)
I propose we use this task to seek wider input on the above, and understand the practical risk in-context. Afterwards, we could either embrace it as something valuable (and thus document and encourage by default through a code convention developer guideline, and file tasks to apply it to existing scripts in production code bases, and allow opt-out as per the development guidelines), — or agree together that it isn't something we want to advertise as a general recommendation (eg. through extension reviews, or guidelines).
Review
The underlying threat imho no longer applies to PHP software today:
- The files in question are CLI entry points, with zero statements between the top of the entry point and these "unregister" calls. Thus there is no place for a custom and compromised wrapper to have been injected via stream_wrapper_register. And no place for a compromised php:// stream to have been written to.
- The attack vector, as I understand it, would be to gain access to a mwmaint shell user's environment variables and to have modified the MW_INSTALL_PATH environment variable and set it to an http or ftp URL under control of the attacker, which the assumption that require would download and run this file as executable code. I am generally supportive of defense-in-depth, but an attacker's shell environment might be a level of access where significantly more harmful and harder-to-prevent compromise is presumed possible.
- In production, we've used the upstream-recommended allow_url_include=0 INI setting since at least 2012. This means require does not permit remotely-downloaded executable code. As such, these stream wrappers are only exposed to fopen() by default, not require or include etc. I imagine a decade ago in PHP 5 or PHP 4, upstream may've had this enabled for compatibility reasons. Over the past decade, it has gone through a similar journey as register_globals:
- Upstream advised any dev and prod environment to disable it (which we did).
- Upstream has disabled it by default through allow_url_include=0.
- Upstream has deprecated the setting entirely in PHP 7 (commit), likely tobe removed later. Thus removing it even as an opt-in attack vector.
- In production, we do not actually permit running of maintenance scripts directly from a person's shell. Instead, they go through the www-data user, MWScript.php and multiversion. Among other things, this disables environment inheritence from the shell user. Thus protecting us even if all of the above has failed.
- In production, multiversion additionally sets MW_INSTALL_PATH explicitly. Thus ignoring injection even if all of the above has failed.
Additionally:
- There is no known concern or proposal for existing maintenance scripts; including and especially the subset of scripts that we actually run in WMF production (noting that most maint scripts never run in prod). As of writing we have 15 maintenance scripts with some variation on the emerged pattern. This compared to 547 maintenance scripts in WMF-deployed code overall.
- There is no known oncern or proposal for the non-CLI code paths where user-supplied data is both likely and expected. Such as web requests to appservers where we download files by URL (Upload by URL, Flickr, etc.), or interact with FileBackend and read-write partly user-supplied file paths, or including PHP files with paths formulated by Composer or MW Autoloader.
- Three years later, people still find that our production environment actually breaks with the above boilerplate due to rejecting "wmf" branch as a bad path (e.g. DiscussionTools: T242134, T316548).
- The boilerplate is hardcoding in distributed fashion details about Guzzle.
- The boilerplate is hardcoding details about ForeignResourceManager.
- The boilerplate is hardcoding the value of core's RUN_MAINTENANCE_IF_MAIN constant. This potentially complicates T99268.
- The boilerplate effectively alters the default PHP configuration at runtime, potentially causing subtle and hard-to-find bugs with stream protocols being available for genuine fopen() use cases everywhere (in web, API, jobrunner, etc), except when the service class or hook is triggered via one of these maintenance scripts.
$ php -a php > require 'http://example.wikimedia.org/krinkle.php'; Warning: require(): http:// wrapper is disabled in the server configuration by allow_url_include=0 in php shell code on line 1 php > require 'https://example.wikimedia.org/krinkle.php'; Warning: require(): https:// wrapper is disabled in the server configuration by allow_url_include=0 in php shell code on line 1