Speed is really not very important for this. But I know there is a couple blocks of code where it basically does the same thing over and over again unnecessarily. I don't think spending very much time on this is warranted, but once the plugin becomes more feature complete, its probably worth it to do at least one round of profiling and checking for very low hanging fruit.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T280990 Run phan as part of composer test, rather than in bespoke CI jobs | |||
| Resolved | Daimona | T203651 Optimize phan-taint-check speed |
Event Timeline
The switch to phan's PluginV2 as part of T216974 should boost the speed per PluginV2 documentation.
So I did some testing, running seccheck on AbuseFilter with mwext-fast. The runtime was 103 seconds. Then I removed seccheck from phan's config, and got a runtime of 72 seconds, which should be our lower bound. Given that 30 seconds aren't that much, I guess it's really not a priority.
Yeah, that seems fairly quick to me. Obviously there are efficiency concerns here since this needs to run in CI, but 30 seconds doesn't seem so terrible imo. Of course if we ever try to get this working on core, that might be a different story. Do we know where AbuseFilter falls in terms of size/complexity for deployed extensions? I'd guess it's at least moderately complex.
Well, core is definitely slower... We can try running it on different extensions, but looking at previous runs it seems like AF has more or less the same runtime as other extensions (or maybe slightly higher). I also tried running seccheck (on AF) with xdebug enabled to see if I could get something useful. I ended up with a runtime of 4000 seconds and no useful data :-/
A little update: after the update to phan 2.2.13, taint-check will start taking way more memory. Scanning AbuseFilter with r542757 checked out took 3056MB of memory on my machine. I think after releasing 3.0.0 (or even before), we should start thinking about performance, both time- and memory-wise.
I think that could very easily involve a ground-up rewrite, which might not be a bad thing. I'm also not sure what the limits are for efficiently generating and walking ASTs for large code bases.
It could be profitable, but I don't know how feasible.
I'm also not sure what the limits are for efficiently generating and walking ASTs for large code bases.
As a lower bound, we have the time and memory that phan takes without loading taint-check. T203651#5155693 was the situation some months ago, but it will change with newer phan versions. As I said in T203651#5586956, memory usage will grow a lot, and will reach 4GB for Wikibase.
T235390 will avoid the overhead of running the "base" phan twice in CI, but then we'll have to do what Brian said in the task description: a round of profiling, and see if there's any low-hanging fruit. The main complication in doing that is probably that phan becomes incredibly slow (up to 10x) when xdebug is enabled.
There's a huge performance deterioration in taint-check 3.0.1. On core it takes like 5 minutes vs "vanilla" phan taking less than one minute. We need to find out what's causing this slowness and fix it.
I should also note that the memory consumption isn't that bad, 1652 MB vs 2241 MB -- this seems to have decreased a lot since the last time I checked. Since my test run uses the shared config file, perhaps this is partly thanks to the improvements in https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/+/560958/.
I've done an initial assessment. First of all the base times (computed after removing some folders from analysis, so not reflecting actual absolute times):
AbuseFilter: 11.6s without taint-check, 28s with taint-check
Core: 57s without taint-check, 7m 17s (!!! Not sure how accurate) with taint-check
Of course, the times "without taint-check" are the minimum theoretical values that we can achieve.
[From now on, times refer to AbuseFilter]
The first thing I determined is that TaintednessBaseVisitor::analyzeFunc is taking most of the extra time. Disabling analyzeFunc brings the time down to 13.3s, which is excellent. The GetReturnObjsVisitor doesn't have a noticeable performance impact; disabling markAllDependentVarsYes (and especially the $classesNeedRefresh part) gave a runtime of 22.8s.
The real place which calls analyzeFunc lots of times is getTaintOfFunction. Removing that call brings the runtime to 15.5s. Interestingly, analyzeFunc is called without checking whether some taint info is already available. Adding that check reduces the runtime to 22s! Core went down to 4m 26s, and memory usage from 2241 to 2135.
A test run on core reveals that this change will make taint-check miss a few things[1]. This might be due to some weird edge case, but IMHO it's certainly a price that we can pay for such a runtime reduction!
[1] - Three issues (all false positives) in maint scripts, one in includes/; two caused-by lines are less precise in maint scripts, one in includes/.
Change 588008 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Optim: Don't reanalyze functions if we already have data
No idea. That one was a very specific edge case which may or may not be related. I even forgot what the actual cause was there...
Change 588008 merged by jenkins-bot:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Optim: Don't reanalyze functions if we already have data
Next up... For core, 1m30s out of 3m30s are spent on markAllDependentVarsYes. Not sure what the best solution would be...
This turns out to be nontrivial. What this method does is: when you call someFunc( $taintedArg ), propagate the taintedness of the argument to all variables that are set by the function. The part that's really bad for performance is the special case for when one of those variables is a class property: it triggers an analysis of *all* methods in the class of the property. This is obviously expensive, and as a TODO comment notes:
// TODO: This is subpar - // * Its inefficient, reanalyzing much more than needed. // * It doesn't handle parent classes properly // * For public class members, it wouldn't catch uses // outside of the member's own class.
the algorithm here is neither sound nor complete, but it catches a handful of issues (roughly 10 issues for core); curiously, removing that part makes another 5-10 new issues appear. Verifying if any of these is a false positive is fairly hard, but I don't trust taint-check's judgement here.
A complete choice algorithm is probably too expensive, because (especially for public props) one would have to iterate the whole codebase searching for where a property is used, then infer that e.g. if function B uses the prop and function A calls function B, then A is also dependent on the prop.
A sound choice algorithm seems easier to achieve, but I think it would still be expensive.
A good compromise (hereafter just named "compromise") could be the following: every time a property is accessed, mark the method we're inside as dependent on that property. This is still not sound, because e.g. we shouldn't care if the property is on the LHS of an assignment, but it's certainly better than blindly analyzing all methods in a class. It's also obviously not complete. The runtime with this algorithm is negligible, and almost the same as not running markAllDependentVarsYes at all. For core, it adds roughly 5 issues, and makes 10 go away.
Another idea is to keep the current setup, but only analyze class methods if this is the first time we're doing that with the current taintedness. However, it turns out this brings absolutely no performance gain. Hence, this option isn't considered below.
Summing up:
Current
Pros: for private props, it's guaranteed not to miss anything; all uses inside the prop's own class are caught.
Cons: it's too slow, analyzing too many things; misses parent classes; misses uses outside of the class.
"Compromise"
Pros: very fast; catches stuff outside of the prop's own class;
Cons: will miss any method whose analysis is still not complete, both inside and outside the prop's own class;
Given the inability to use a sound & complete algorithm, I tend to prefer the fastest option, even if it comes with a handful of false positives or false negatives, because it's roughly an even trade.
Additionally, the "compromise" can be modified in the following way: if the property belongs to the class we're currently inside (i.e. we're probably analyzing a method of that class), switch to the "current" behaviour, and analyze all the methods in the class. Performance and results for core are the same as with the unmodified compromise, but given that it's a bit more complete, I'd go with this one.
In theory this could be further improved if we avoid analyzing the current method, but this doesn't seem to be causing any performance issue.
Long rant short: I'm going to implement the "modified compromise", but first check if the newly-appearing false positives have a different cause, and fix it if so.
Correction, there was an error in my test code. This change brings no performance degradation, but it makes like 25/30 issues disappear (19 in LogFormatter alone).
Other ideas I've been trying:
- Analyze the whole class if it has less than X methods. X=10 has no impact on core; X=30 has almost the same performance as the original "compromise", but catches 5-10 more issues; X=50 catches the same as 30, but it takes noticeably more.
- Track prop dependencies before analysis, like phan's InferPureVisitor. In theory this would be great, but in practice, before analysis phan isn't able to infer what class a property belongs to (for non-static props), so it's almost useless.
- Do not analyze pure methods (requires phan's InferPureVisitor). As expected, the issues are the same as "current", but it takes 3m10s.
- A combination of the above. One that seems to work fairly well is: modified "compromise", analyzing the whole class if it has <= 25 methods, and checking only pure methods. Core is analyzed in 2m15s, and it catches almost everything. The only exception being roughly 20 issues related to Title::getPrefixedText/getPrefixedDBKey(), which isn't seen as unsafe. But well, Title has 205 methods, so it's a bit more than 25... This special case can be fixed by annotating those methods as @return-taint tainted.
Change 596810 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Performance: Use a different algorithm for props dependencies
Time to focus on this for real. I'm unsure about the patch above, especially because it would need some tweaks upstream for it to work. Other ideas I'm going to explore:
- Using BeforeAnalyzePhaseCapability more aggressively, building dependency graphs between methods, props, and whatnot (perhaps also taintedMethodLinks can be migrated)
- Limiting this code path to something very simple and quick, but recommend using --analyze-twice. Note that --analyze-twice would effectively make this path useless, but of course it takes a lot of time.
Turns out this is not so necessary. I just discovered https://github.com/phan/phan/issues/4000, and I obviously was running phan without pcntl. phan's runtime on core is around 50s, and adding taint-check results in a total of 2 minutes, which is not (too) bad.
Change 675179 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Performance improvements
Change 675179 merged by jenkins-bot:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Performance improvements
Not really, no. For comparison, phan alone (still) takes 50 seconds on MW core, so I'd hope to reach something like 1m30 at least... I also believe that the main source of slowdowns is now the backpropagation of taintedness to class members, which makes taint-check analyze the whole class of that property (and it can be a massive class like Title, or EditPage etc.). I'm not entirely sure how to optimize that, maybe something like the approach above (now obsolete).
Another problem is that phan with xdebug becomes very slow, and running on a codebase such as MW core takes a lot of time. Not to mention that the size of the generated callgrind files is gargantuan. Trying to open a callgrind file larger than ~450MB crashes KCachegrind for me. That is the size of the callgrind file generated after running the first 30 tests. On MW core instead, the 500 MBs are reached after parsing ~100 files (on a total of 6600), which is just the first step of running phan (and also very fast without xdebug, followed by method analysis and then file analysis, which is the real heavy process). For comparison, taint-check isn't even initialized before the file analysis phase. This would suggest an approximate final size of 26 GB and just for the parsing phase. I think nobody ever accounted for callgrind files this large, since the documentation on xdebug.org calls 500 MB an "enormous file".
Change 677621 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Many more performance improvements
I managed to fix my issue with kcachegrind so I was able to load bigger cachegrind files. I also managed to profile a complete run on AbuseFilter, after removing vendor and a part of core from the parsing phase (so it only had 1500 files to parse), which gave a 6G cachegrind file. However, I believe that it's not possible to do better, unless and until code filtering is implemented in xdebug (https://bugs.xdebug.org/view.php?id=1670). That would hopefully create much smaller cachegrind files, and also clearer output.
Change 680751 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Even more perf improvements
Change 677621 merged by jenkins-bot:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Many more performance improvements
Change 680751 merged by jenkins-bot:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Even more perf improvements
FTR, we're now at roughly 2m 8s for MW core. I'm now using xhprof, building a list of ignored functions via list of classes + reflection. This has terrible performance, and it takes roughly 90 minutes to run on MW core, but at least it gives me a full profile without much noise. This was used for the patches above. Unfortunately, I think there aren't many easy optimizations left, so I might need to change some higher-level logic. Also, noting that r681443 will make things worse. Hopefully not as bad as the current commit message says, but still...
We should probably have a task for "move phan work into composer test jobs", I guess. We're pretty much there from a performance view (which was the blocker, beyond infrastructure concerns).
Change 682109 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Performance improvements again
Yeah, probably. According to https://integration.wikimedia.org/ci/job/mediawiki-core-php72-phan-docker/45973/console, it seems like CI is roughly as fast as my machine, and I think 2 minutes is acceptable. As I said, I'd aim for 1m 30s in general, but that's not a blocker for moving phan into composer test.
Alright, so, I've reached a conclusion and am going to write a detailed explanation here for future reference.
One of the things that make taint-check slow is the following mechanism: suppose that we have a function-like call like someFunc( $taintedArgument ). Taint-check has a list of what variables/properties that someFunc can set. For instance, if it's a simple setter:
public function someFunc( $arg ) { $this->someProp = $arg; }
here we know that whatever is given to someFunc, it gets put inside someProp. So if we find a call to someFunc with a tainted argument, we do two things:
- Add the taintedness of the argument to the property (or really any object set by someFunc). This is cheap.
- For any object being set that is also a class property, we take its class and reanalyze all methods in that class. "Reanalyze" here means telling phan to analyze the body of each method.
As you may guess, the second point isn't cheap at all. Think about what would happen for MW core if one of those properties being set belongs to a massive class such as Title, Parser, OutputPage, EditPage etc. And then imagine this thing happening, say, 15 times for each of these classes. This is obviously going to be SLOW.
You may wonder why we're triggering this massive analysis. The reason is that the last time we analyzed a given class we likely didn't know that the current property would become tainted, and so we could have missed something. However, the current implementation is not complete, as noted in a code comment: it will analyze methods that don't use the current property at all, and will miss any usage of that property out of the class which declares it. But since taint-check (unlike phan) aims to report as many issues as possible, this is better than nothing.
A nice solution would be to rely on phan's "--analyze-twice" flag (T269816). With this option, once phan finishes analyzing all the files, it starts over, but keeping all information that it gathered during the first run. Since this would also preserve taintedness data, it would give us better functionality than the current code (which could be removed at that point). The obvious drawback is that a second analysis phase will consume additional time and memory. However, it's much faster than the first one, so it's not as terrible as it might seem. For comparison (on MW core):
- time: 2m 2s -> 2m 35s
- memory: 3733 MB -> 4125 MB
- +99 issues (some are duplicates, some are unused suppressions)
so I think for the long term, this should be our ideal goal.
For now, I'm going to take another approach: limit the number of times that a class can be reanalyzed to 1 (per class). This will obviously hide some issues, however, I think that it is an acceptable compromise. After all, we'd get even better results if we reanalyzed all the codebase when a property changes (so to catch usages outside of its own class), but nobody would ever implement that. FTR, with the capping approach:
- time: 2m 2s -> 1m 54s
- memory: 3733 MB ->3660 MB
- +10 issues overall, some are new, some are unused suppressions, 3-4 old issues disappear
but most importantly, it would allow us to do r681443. With that patch, we'll preserve "links" (i.e. what methods can change what variables and vice versa) through function calls, which makes analysis of function calls much more precise (see test). More "links" means more occasions to trigger an analysis of a class, which slows everything down. But with the limit in place, there's basically no difference, performance-wise.
Change 682245 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):
[mediawiki/tools/phan/SecurityCheckPlugin@master] Limit the number of times that a class can be reanalyzed
Change 596810 abandoned by Daimona Eaytoy:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Performance: Use a different algorithm for props dependencies
Reason:
Outdated.
Change 682109 merged by jenkins-bot:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Performance improvements again
Change 682245 merged by jenkins-bot:
[mediawiki/tools/phan/SecurityCheckPlugin@master] Limit the number of times that a class can be reanalyzed
@Daimona: All related patches in Gerrit have been merged. I'm closing this task as resolved. If there is more work, please reopen and elaborate. Thanks.
I think it's fine to close this: improving performance is one of those neverending tasks (unless you have specific goals, which we didn't have here). This task can be reopened (or a new one created) if we find taint-check to be too slow.