Page MenuHomePhabricator

Make taint-check understand IDatabase::insert with multiple rows
Closed, ResolvedPublicBUG REPORT

Description

See code comment:

'\Wikimedia\Rdbms\IDatabase::insert' => [
	self::SQL_EXEC_TAINT, // table name
	// FIXME This doesn't correctly work
	// when inserting multiple things at once.
	self::SQL_NUMKEY_EXEC_TAINT,
	self::SQL_EXEC_TAINT, // method name
	self::SQL_EXEC_TAINT, // options. They are not escaped
	'overall' => self::NO_TAINT
],

It can cause false positives due to how NUMKEY works.

Event Timeline

See suppressions added in r719481 for a test case.

Change 719511 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] [WIP] Insert multiple rows

https://gerrit.wikimedia.org/r/719511

Change 736455 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] phan: remove unused suppression

https://gerrit.wikimedia.org/r/736455

Change 736455 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] phan: remove unused suppression

https://gerrit.wikimedia.org/r/736455

Looking at this again, I spent way more time wondering why we even need this. The $rows values are always escaped, and therefore safe. Then I looked at the hardcoded data for the other insert() methods, and there it also says: The keys names are unsafe. Now it makes more sense...

Change 960714 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Make it possible to return FunctionTaintedness from getCustomFuncTaints

https://gerrit.wikimedia.org/r/960714

So, there are actually two distinct issues to solve here. First, we need to tell taint-check that only the array keys are unsafe, and not the values. Second, that if an array with numerical keys is passed, it should look at the keys of the inner arrays, not those of the outer array. The first part is relatively easy, and I've pretty much done it already. The second part is harder, because while taint-check does use union types under some circumstances, they're kept separated from the shape of the taintedness data. In practice, this means that resolving this cleanly would be very hard. It would be easier to implement a hacky solution, with taint-check firing a hook before analyzing a method call, and overriding the taintedness of the insert() parameter based on the current $rows argument's type. Something like the existing modifyArgTaint hook used for Message|string parameters, but that would mean adding a new extension point to taint-check.

However: Database::insert is marked as @internal, and presumably it'll be deprecated in favour of InsertQueryBuilder. I noticed that IQB has two separate methods for the single row vs multiple rows cases, which means it's easy to hardcode that in taint-check.

Therefore, I will add a new hook, but mark it as internal/unstable, and then we can consider removing it once everything uses IQB (which is probably going to be a long time).

Change 960714 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Make it possible to return FunctionTaintedness from getCustomFuncTaints

https://gerrit.wikimedia.org/r/960714

Change 719511 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Special-case and improve handling of Database::insert

https://gerrit.wikimedia.org/r/719511

Change 961482 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] Remove taint-check annotations from IDatabase::insert()

https://gerrit.wikimedia.org/r/961482

Change 961482 merged by jenkins-bot:

[mediawiki/core@master] Remove taint-check annotations from IDatabase::insert()

https://gerrit.wikimedia.org/r/961482

While this is now working on the taint-check side, it still doesn't do that well in practice. The reason is that we rely on phan being able to properly infer the shape of the array, so that we can switch between the single-row and multi-rows cases. However, phan isn't always able to infer the union type of the array accurately. For instance, it gets confused when array_chunk is used (https://github.com/phan/phan/issues/4807), but that's just an example. In practice, this means that the current master of taint-check has about 20 SQLi false positives for MW core. I will look into this more.

Change 961854 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Improve shape inference for some built-in array functions

https://gerrit.wikimedia.org/r/961854

Change 961854 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Improve shape inference for some built-in array functions

https://gerrit.wikimedia.org/r/961854

Change 963108 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Track key links in assignments

https://gerrit.wikimedia.org/r/963108

Change 963108 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Track key links in assignments

https://gerrit.wikimedia.org/r/963108

Callind this done now, there's only one false positive left in MW core which is caused by array_map (which taint-check can't understand).