Re-evaluate stream_wrapper_unregister seen in recent maintenance scripts
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	Aug 30 2022, 9:08 PM

Description

Context

In 2019, as part of a routine extension review (MachineVision: T227346), the standard MW idiom for maintenance scripts (the file header) was flagged as a "low risk" security issue by a static analysis tool (which?):

Status quo

$IP = getenv( 'MW_INSTALL_PATH' );
if ( $IP === false ) {
	$IP = __DIR__ . '/../../..';
}
require_once "$IP/maintenance/Maintenance.php";

class MigrateStuffExample extends Maintenance {
	public function execute() {
		// Actual script here
		// ...
		// ...
	}
}

$maintClass = MigrateStuffExample::class;
require_once RUN_MAINTENANCE_IF_MAIN;

While we commonly ignore these tools when the specific context makes the concern a non-issue, we suggested it is probably worthwhile to fortify in this case (ref T227346#5627941). There were a number of iterations (MachineVision: one, two, three; DT: four, five), but we seem to have settled on the following emerged pattern that we now see in a few places:

New pattern

// Security: Disable all stream wrappers and reenable individually as needed
foreach ( stream_get_wrappers() as $wrapper ) {
	stream_wrapper_unregister( $wrapper );
}
// Needed by the Guzzle library for some reason
stream_wrapper_restore( 'php' );
// Needed by ForeignResourceManager to unpack TAR files
stream_wrapper_restore( 'phar' );

stream_wrapper_restore( 'file' );
$basePath = getenv( 'MW_INSTALL_PATH' );
if ( $basePath ) {
	if ( !is_dir( $basePath )
		|| strpos( $basePath, '.' ) !== false
		|| strpos( $basePath, '~' ) !== false
	) {
		die( "Bad MediaWiki install path: $basePath\n" );
	}
} else {
	$basePath = __DIR__ . '/../../..';
}
require_once "$basePath/maintenance/Maintenance.php";

class MigrateStuffExample extends Maintenance {
	public function execute() {
		// Actual script here
		// ...
	}
}

$maintClass = MigrateStuffExample::class;

$doMaintenancePath = RUN_MAINTENANCE_IF_MAIN;
if ( !( file_exists( $doMaintenancePath ) &&
	realpath( $doMaintenancePath ) === realpath( "$basePath/maintenance/doMaintenance.php" ) ) ) {
	die( "Bad maintenance script location: $basePath\n" );
}
require_once RUN_MAINTENANCE_IF_MAIN;

There is a benefit to reviewing code without the bias of MW familiarity (whether by tooling or by hand) as this avoids leaving a common pattern ignored or unquestioned. However, I think this particular issue could use a review of the risk and its proposed mitigation within its appropiate context. In this case, the context of modern PHP and MediaWiki development and production environments like WMF. (In this case, I imagine it also played a role that WMF has at times grown through teams created "in a vacuum" that lack sufficient familiarity to make an informed choice.)

I propose we use this task to seek wider input on the above, and understand the practical risk in-context. Afterwards, we could either embrace it as something valuable (and thus document and encourage by default through a code convention developer guideline, and file tasks to apply it to existing scripts in production code bases, and allow opt-out as per the development guidelines), — or agree together that it isn't something we want to advertise as a general recommendation (eg. through extension reviews, or guidelines).

Review

The underlying threat imho no longer applies to PHP software today:

The files in question are CLI entry points, with zero statements between the top of the entry point and these "unregister" calls. Thus there is no place for a custom and compromised wrapper to have been injected via stream_wrapper_register. And no place for a compromised php:// stream to have been written to.
The attack vector, as I understand it, would be to gain access to a mwmaint shell user's environment variables and to have modified the MW_INSTALL_PATH environment variable and set it to an http or ftp URL under control of the attacker, which the assumption that require would download and run this file as executable code. I am generally supportive of defense-in-depth, but an attacker's shell environment might be a level of access where significantly more harmful and harder-to-prevent compromise is presumed possible.
In production, we've used the upstream-recommended allow_url_include=0 INI setting since at least 2012. This means require does not permit remotely-downloaded executable code. As such, these stream wrappers are only exposed to fopen() by default, not require or include etc. I imagine a decade ago in PHP 5 or PHP 4, upstream may've had this enabled for compatibility reasons. Over the past decade, it has gone through a similar journey as register_globals:
- Upstream advised any dev and prod environment to disable it (which we did).
- Upstream has disabled it by default through allow_url_include=0.
- Upstream has deprecated the setting entirely in PHP 7 (commit), likely tobe removed later. Thus removing it even as an opt-in attack vector.
In production, we do not actually permit running of maintenance scripts directly from a person's shell. Instead, they go through the www-data user, MWScript.php and multiversion. Among other things, this disables environment inheritence from the shell user. Thus protecting us even if all of the above has failed.
In production, multiversion additionally sets MW_INSTALL_PATH explicitly. Thus ignoring injection even if all of the above has failed.

Additionally:

There is no known concern or proposal for existing maintenance scripts; including and especially the subset of scripts that we actually run in WMF production (noting that most maint scripts never run in prod). As of writing we have 15 maintenance scripts with some variation on the emerged pattern. This compared to 547 maintenance scripts in WMF-deployed code overall.
There is no known oncern or proposal for the non-CLI code paths where user-supplied data is both likely and expected. Such as web requests to appservers where we download files by URL (Upload by URL, Flickr, etc.), or interact with FileBackend and read-write partly user-supplied file paths, or including PHP files with paths formulated by Composer or MW Autoloader.
Three years later, people still find that our production environment actually breaks with the above boilerplate due to rejecting "wmf" branch as a bad path (e.g. DiscussionTools: T242134, T316548).
The boilerplate is hardcoding in distributed fashion details about Guzzle.
The boilerplate is hardcoding details about ForeignResourceManager.
The boilerplate is hardcoding the value of core's RUN_MAINTENANCE_IF_MAIN constant. This potentially complicates T99268.
The boilerplate effectively alters the default PHP configuration at runtime, potentially causing subtle and hard-to-find bugs with stream protocols being available for genuine fopen() use cases everywhere (in web, API, jobrunner, etc), except when the service class or hook is triggered via one of these maintenance scripts.

Example

$ php -a
php > require 'http://example.wikimedia.org/krinkle.php';
Warning: require(): http:// wrapper is disabled in the server configuration by allow_url_include=0 in php shell code on line 1
php > require 'https://example.wikimedia.org/krinkle.php';
Warning: require(): https:// wrapper is disabled in the server configuration by allow_url_include=0 in php shell code on line 1

Related Objects

Mentioned Here: T99268: RfC: Create a proper command-line runner for MediaWiki maintenance tasks
T227346: Security readiness review for the MachineVision extension
T242134: Security Review For Talk pages project
T316548: DiscussionTools' maintenance scripts cannot be executed in Wikimedia production

Event Timeline

Krinkle created this task.Aug 30 2022, 9:08 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 30 2022, 9:08 PM

Normally for these sorts of things, the attack vector would be from a web prespective where the attacker controls a variable but not much else - where it is difficult to write an evil file on the filesystem, but easy to load one over http.

For the commandline version, the attack only makes sense if you can run maintenance scripts, with malicious environment variables, but cannot cause a file to be written to the filesystem (including work arounds like log files, temp files, or stuff in /proc) or directly run php or eval.php. That seems like an unlikely set of circumstances.

In T316722#8200850, @Bawolff wrote:

For the commandline version, the attack only makes sense if you can run maintenance scripts, with malicious environment variables, but cannot cause a file to be written to the filesystem or directly run php or eval.php. That seems like an unlikely set of circumstances.

This is why the issue was rated low risk within the original review. Which is automatically accepted by the WMF, per our current risk management framework. While I agree that the issue would be difficult to exploit in most environments (I believe I used that exact descriptor within the review) I would not agree that there is no risk at all. And since low risk is automatically accepted by the Foundation, no further mitigation was mandated by any policy, procedure, guideline, best practice, etc. While I would personally like to see some form of mitigation for path traversal/file system injection code weaknesses (hence my recommendation within various related reviews), if no feasible solution exists due to the complications of Wikimedia production, etc, (an issue I brought up at least a couple of times within various reviews and gerrit change sets) then the current well-intentioned mitigations can be removed without issue.

matmarex subscribed.Sep 2 2022, 12:12 AM

sbassett triaged this task as Low priority.Sep 6 2022, 4:44 PM

sbassett added a project: SecTeam-Processed.

sbassett moved this task from Incoming to Watching on the Security-Team board.

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.Sep 26 2022, 8:23 PM

sbassett moved this task from Watching to Frozen on the Security-Team board.Mar 20 2023, 4:17 PM

Krinkle closed this task as Resolved.Aug 7 2023, 1:23 AM

Krinkle claimed this task.