Page MenuHomePhabricator

Make taint-check understand array offsets
Closed, ResolvedPublic

Description

A possible implementation I've been thinking about:

  • In each possibly-array phan object, create a property like $obj->offsetTaintedness = [ 'overall' => 0, 'keys' => [] ]
  • When we find an offset assignment:
    • If we can determine the offset with 100% accuracy, add the taintedness (same $override as setTaintedness) to $obj->offsetTaintedness['keys'][ $key_being_assigned ]
    • If we cannot determine the offset, add the taintedness ($override = false) to $obj->offsetTaintedness['overall']
    • If we cannot determine an offset, but not with 100% accuracy (i.e. $idx = rand() ? 'literal' : $unknown), add it to both the key and the overall
  • When we find an offset access:
    • Always return the taintedness in 'overall'
    • If we can determine a key, OR the taintedness of that key to 'overall'
  • Perhaps handle array shape mutation (e.g. unset), but this is going to be difficult.

(Note: there might be more than one offset for both write and read operations)

This shouldn't be too hard to implement, and should work in easy cases. The main downside is that any uncertainty for a single write will affect all reads. For instance:

$arr['foo1'] = 'safe';
$arr['foo2'] = 'safe';
$arr['foo3'] = 'safe';
$arr[$unknown] = $_GET['tainted'];

echo $arr['foo1']; // Unsafe, same for foo2 and foo3

Event Timeline

Daimona triaged this task as Lowest priority.May 28 2020, 5:28 PM

I tried implementing this, a few remarks:

  • When we find an offset assignment:
    • If we can determine the offset with 100% accuracy, add the taintedness (same $override as setTaintedness) to $obj->offsetTaintedness['keys'][ $key_being_assigned ]
    • If we cannot determine the offset, add the taintedness ($override = false) to $obj->offsetTaintedness['overall']
    • If we cannot determine an offset, but not with 100% accuracy (i.e. $idx = rand() ? 'literal' : $unknown), add it to both the key and the overall
  • Phan offers ContextNode::getEquivalentPHPScalarValue to do that; however, it can only either return a scalar if it's 100% sure, or nothing, so we only have two cases
  • If no offset taint exists at all, we should return the taintedness of the variable as always.
  • visitAssign becomes very complicated.
    • It would have to handle the case with AST_DIM at the LHS, and with AST_ARRAY at the RHS, and potentially both at the same time
    • It's already a mess due to potentially having many LHS object and/or many RHS objects...
    • It should also keep setting the taintedness property, for code paths that don't read offsetTaintedness
    • Handling of $override becomes messy
  • Getting taint of an AST_ARRAY node can now have two meanings...
  • When we find an offset access:
    • Always return the taintedness in 'overall'
    • If we can determine a key, OR the taintedness of that key to 'overall'

And if we cannot determine a key, add the taintedness of all keys as well.


Given this initial bit of investigation, I think this won't happen anytime soon.

Daimona raised the priority of this task from Lowest to Medium.Jun 29 2020, 10:45 PM

Given the recent improvements of conditional branches and a few other things still under review, I'd like to retry this one. I think this is currently the biggest limitation of the plugin; fixing this would get us very close to declaring the plugin out of beta.

Perhaps I should try a different implementation, that would likely take advantage of the UnionType of arrays (i.e. storing taint data there). Unsure if it's doable, but it would be way more precise.

Change 609500 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Analyze array elements on their own

https://gerrit.wikimedia.org/r/609500

Given the recent improvements of conditional branches and a few other things still under review, I'd like to retry this one. I think this is currently the biggest limitation of the plugin; fixing this would get us very close to declaring the plugin out of beta.

Perhaps I should try a different implementation, that would likely take advantage of the UnionType of arrays (i.e. storing taint data there). Unsure if it's doable, but it would be way more precise.

FTR, I ended up redoing the original implementation, but using value objects to store taintedness (as opposed to plain integers), so a single object can have multiple types of taintedness.

I also confirm that this is the last big limitation before moving out of beta; the rest is just fixing/improving stuff here and there, but nothing terrible.

Change 609500 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Analyze array elements on their own

https://gerrit.wikimedia.org/r/609500

Change 609500 merged by jenkins-bot:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Track taintedness of single array elements

https://gerrit.wikimedia.org/r/609500