A possible implementation I've been thinking about:
- In each possibly-array phan object, create a property like $obj->offsetTaintedness = [ 'overall' => 0, 'keys' => [] ]
- When we find an offset assignment:
- If we can determine the offset with 100% accuracy, add the taintedness (same $override as setTaintedness) to $obj->offsetTaintedness['keys'][ $key_being_assigned ]
- If we cannot determine the offset, add the taintedness ($override = false) to $obj->offsetTaintedness['overall']
- If we cannot determine an offset, but not with 100% accuracy (i.e. $idx = rand() ? 'literal' : $unknown), add it to both the key and the overall
- When we find an offset access:
- Always return the taintedness in 'overall'
- If we can determine a key, OR the taintedness of that key to 'overall'
- Perhaps handle array shape mutation (e.g. unset), but this is going to be difficult.
(Note: there might be more than one offset for both write and read operations)
This shouldn't be too hard to implement, and should work in easy cases. The main downside is that any uncertainty for a single write will affect all reads. For instance:
$arr['foo1'] = 'safe'; $arr['foo2'] = 'safe'; $arr['foo3'] = 'safe'; $arr[$unknown] = $_GET['tainted']; echo $arr['foo1']; // Unsafe, same for foo2 and foo3