Page MenuHomePhabricator

Move hardcoded taintedness data from taint-check to annotations in MW core
Open, Needs TriagePublic

Description

This is a long overdue change, and hardcoded stuff is making it hard to rename classes, see T166010#8349431.

Potential plan, copied from T166010#8349594:

  • Copy annotations from taint-check to WebRequest (and potentially other classes, but not all of them for now)
    • Backport said change to all supported releases, just to be sure.
  • Add a test in core to ensure that said annotations are working (i.e., write a test file that uses the unsafe methods, make sure that it's analyzed by phan, make sure it reports issues). Ideally, this would be done for all methods hardcoded in taint-check, not just the ones we're migrating now.
  • Resolve T291743 so that caused-by lines are easier to read when annotations are involved
  • In taint-check, remove hardcoded taintedness from methods that are now annotated in core
  • Release new version of taint-check and mw-phan, upgrade to it in core
  • Finally, move the WebRequest class and its friends.

Event Timeline

Change 850177 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] Add tests for taint-check

https://gerrit.wikimedia.org/r/850177

One problem with the test... We should make sure that taint-check has taint data for aliased classes (i.e., \MediaWiki\Shell\Result, and classes moved for T166010). However, class aliases are disabled in the phan configuration for core, because the aliases are only there for BC and core should not use old class names. But this also means that we can't test them. I'm not sure what to do here...

One problem with the test... We should make sure that taint-check has taint data for aliased classes (i.e., \MediaWiki\Shell\Result, and classes moved for T166010). However, class aliases are disabled in the phan configuration for core, because the aliases are only there for BC and core should not use old class names. But this also means that we can't test them. I'm not sure what to do here...

Aren't we meant to use deprecated class helpers for alerting people about using old class names, rather than phan erroring that it doesn't know the existence of the class?

One problem with the test... We should make sure that taint-check has taint data for aliased classes (i.e., \MediaWiki\Shell\Result, and classes moved for T166010). However, class aliases are disabled in the phan configuration for core, because the aliases are only there for BC and core should not use old class names. But this also means that we can't test them. I'm not sure what to do here...

Aren't we meant to use deprecated class helpers for alerting people about using old class names,

Maybe? But phan doesn't have this feature, unfortunately. You can only configure whether it will parse aliases, but then it treats aliases the same as the actual classes, and does not flag usage of an alias. This would definitely be a good addition to phan.

rather than phan erroring that it doesn't know the existence of the class?

It only reports those issues in core though, as the default setting in our config is to read class_alias, and that is true in I think every MW repo except for core.

One problem with the test... We should make sure that taint-check has taint data for aliased classes (i.e., \MediaWiki\Shell\Result, and classes moved for T166010). However, class aliases are disabled in the phan configuration for core, because the aliases are only there for BC and core should not use old class names. But this also means that we can't test them. I'm not sure what to do here...

Aren't we meant to use deprecated class helpers for alerting people about using old class names,

Maybe? But phan doesn't have this feature, unfortunately. You can only configure whether it will parse aliases, but then it treats aliases the same as the actual classes, and does not flag usage of an alias. This would definitely be a good addition to phan.

rather than phan erroring that it doesn't know the existence of the class?

It only reports those issues in core though, as the default setting in our config is to read class_alias, and that is true in I think every MW repo except for core.

Yup. I'm suggesting enabling enable_class_alias_support for phan in core as well as everywhere else (i.e. revert the config change in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/536327 ), and relying on other routes to warn about this concern. But as that patch shows, at least at that point, the deprecated class helper wasn't sufficient to actually get people to use the right classes…

Yup. I'm suggesting enabling enable_class_alias_support for phan in core as well as everywhere else (i.e. revert the config change in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/536327 ), and relying on other routes to warn about this concern. But as that patch shows, at least at that point, the deprecated class helper wasn't sufficient to actually get people to use the right classes…

Relying on other routs would be great, but I don't think they exist at the moment... Re-enabling class alias support for core would solve some issues, but introduce others. As I said, I'm not sure what the best solution would be here...

Yup. I'm suggesting enabling enable_class_alias_support for phan in core as well as everywhere else (i.e. revert the config change in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/536327 ), and relying on other routes to warn about this concern. But as that patch shows, at least at that point, the deprecated class helper wasn't sufficient to actually get people to use the right classes…

Relying on other routs would be great, but I don't think they exist at the moment...

Yeah. :-(

Re-enabling class alias support for core would solve some issues, but introduce others. As I said, I'm not sure what the best solution would be here...

In this case, I think "able to do refactors" outweighs "irritating use of old classes which makes patches slightly slower when the time for removal comes along".

Re-enabling class alias support for core would solve some issues, but introduce others. As I said, I'm not sure what the best solution would be here...

In this case, I think "able to do refactors" outweighs "irritating use of old classes which makes patches slightly slower when the time for removal comes along".

Maybe, but I'm not fully convinced... In general, I think having phan warn when a class alias is used would help us a lot, and not just with this task. T166010 will result in a bazillion of class aliases, and having something tell you when you're using one would be super useful. I've filed https://github.com/phan/phan/issues/4740 upstream.

Re-enabling class alias support for core would solve some issues, but introduce others. As I said, I'm not sure what the best solution would be here...

In this case, I think "able to do refactors" outweighs "irritating use of old classes which makes patches slightly slower when the time for removal comes along".

Maybe, but I'm not fully convinced... In general, I think having phan warn when a class alias is used would help us a lot, and not just with this task. T166010 will result in a bazillion of class aliases, and having something tell you when you're using one would be super useful. I've filed https://github.com/phan/phan/issues/4740 upstream.

My vague expectation is to work on a bot that auto-corrects imports and run it in big sweeps so we can kill off the aliases; phan warning about this would be reasonable, but it's not the same category of warning as we normally use phan for. Maybe it could be a script powered by phan but with a different flag?

phan warning about this would be reasonable, but it's not the same category of warning as we normally use phan for.

That's true, all issues related to deprecated or internal classes are disabled in our config. However, I think it might still be useful to report usage of class aliases. The main reason being that replacing deprecated/internal methods might be hard, but replacing a class alias is generally just a string replacement (plus adding use).

Maybe it could be a script powered by phan but with a different flag?

We could use a custom plugin, but I don't know how hard it would be to write it.

Maybe it could be a script powered by phan but with a different flag?

We could use a custom plugin, but I don't know how hard it would be to write it.

I was just thinking a local Python script I ran, but sure. :-)

Given the status quo, I think at the moment it's better to re-add class_alias support to MW core. This is not what I'd want long term, but we need to do something now.

Change 960166 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] phan: Re-enable class_alias support

https://gerrit.wikimedia.org/r/960166

Change 960171 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] Add first batch of taint-check annotations

https://gerrit.wikimedia.org/r/960171

Change 960173 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] Add second batch of taint-check annotations

https://gerrit.wikimedia.org/r/960173

Change 960176 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] Add taint-check annotations to database methods

https://gerrit.wikimedia.org/r/960176

Change 960179 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] Add another batch of taint-check annotations

https://gerrit.wikimedia.org/r/960179

Change 960166 merged by jenkins-bot:

[mediawiki/core@master] phan: Re-enable class_alias support

https://gerrit.wikimedia.org/r/960166

Change 850177 merged by jenkins-bot:

[mediawiki/core@master] Add tests for taint-check

https://gerrit.wikimedia.org/r/850177

Change 960171 merged by jenkins-bot:

[mediawiki/core@master] Add first batch of taint-check annotations

https://gerrit.wikimedia.org/r/960171

Change 960173 merged by jenkins-bot:

[mediawiki/core@master] Add second batch of taint-check annotations

https://gerrit.wikimedia.org/r/960173

Change 960176 merged by jenkins-bot:

[mediawiki/core@master] Add taint-check annotations to database methods

https://gerrit.wikimedia.org/r/960176

Change 960179 merged by jenkins-bot:

[mediawiki/core@master] Add another batch of taint-check annotations

https://gerrit.wikimedia.org/r/960179

Change 960706 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Use a "fake" plugin for integration tests

https://gerrit.wikimedia.org/r/960706

Change 960707 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Drop a good chunk of custom func taints from the MediaWiki plugin

https://gerrit.wikimedia.org/r/960707

Change 960706 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Use a "fake" plugin for integration tests

https://gerrit.wikimedia.org/r/960706

Change 960707 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Drop a good chunk of custom func taints from the MediaWiki plugin

https://gerrit.wikimedia.org/r/960707

Change 961251 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Inherit method annotations from parents anywhere in the hierarchy

https://gerrit.wikimedia.org/r/961251

Change 961251 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Inherit method annotations from parents anywhere in the hierarchy

https://gerrit.wikimedia.org/r/961251

Just a quick update: there are a bunch of methods still left to migrate, mostly shell stuff (that would also need to be moved over to shellbox, not MW core)

Revisiting this, would it be possible to run two phan jobs, the main one with enable_class_alias_support set false, and a second one covering the integration tests where it's set true?

Revisiting this, would it be possible to run two phan jobs, the main one with enable_class_alias_support set false, and a second one covering the integration tests where it's set true?

Yes, but it might be somewhat messy... In terms of both CI setup, and configuring phan. I'd really, really like https://github.com/phan/phan/issues/4740 to be resolved, but I guess it's not going to happen anytime soon. Last time I checked, it would be feasible to treat it the same as @deprecated, but that still doesn't work in doc comments and function signatures, which I think are the vast majority of the usages.

Revisiting this, would it be possible to run two phan jobs, the main one with enable_class_alias_support set false, and a second one covering the integration tests where it's set true?

Yes, but it might be somewhat messy... In terms of both CI setup, and configuring phan.

How about we move the phan command into the composer test command (with regular config), and then core can have a bespoke special phan job for testing taint-check in-situ that also over-rides enable_class_alias_support to true?

I'd really, really like https://github.com/phan/phan/issues/4740 to be resolved, but I guess it's not going to happen anytime soon. Last time I checked, it would be feasible to treat it the same as @deprecated, but that still doesn't work in doc comments and function signatures, which I think are the vast majority of the usages.

Ack. :-(

How about we move the phan command into the composer test command (with regular config)

That would be T280990... Which I guess has pros and cons?

and then core can have a bespoke special phan job for testing taint-check in-situ that also over-rides enable_class_alias_support to true?

Maybe, but maybe it's a bit overkill? It might possible to run the phan tests as a PHPUnit test, just like the tests in taint-check itself, and in mediawiki-phan-config. Doing it should be easy enough, doing it without duplicating the usual 100 lines of boilerplate might be, huh, slightly harder...

Change #1128518 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/tools/phan@master] config: Update remaining addGlobalsWithTypes calls to namespaced classes

https://gerrit.wikimedia.org/r/1128518

Change #1128518 merged by jenkins-bot:

[mediawiki/tools/phan@master] config: Update remaining addGlobalsWithTypes calls to namespaced classes

https://gerrit.wikimedia.org/r/1128518

There are a few more left, see https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/phan/SecurityCheckPlugin/+/9a021f3704adecc4a060e6deccf24462464a1f2a/MediaWikiSecurityCheckPlugin.php#88. From a quick glance:

  • Database stuff needs to remain here due to complex taintedness shape which is not exposed in annotations; filed T411040 and made it a blocker.
  • Shell stuff looks like it could be migrated
  • Status(Value) stuff is there to address performance issues / limitations of taint-check. It should probably remain there unless we can improve the underlying performance issue. If the issue is fundamentally impossible to prevent (I don't remember the details), then I think moving them to core is OK.

I'm not planning to work on this soon though.