Page MenuHomePhabricator

Consider enabling stricter type checks in phan
Open, Needs TriagePublic

Description

Phan comes with a few options, disabled by default, to make type analysis much stricter. From the manual:

  • strict_method_checking
  • strict_object_checking
  • strict_param_checking
  • strict_property_checking
  • strict_return_checking

We could consider enabling this to make our code more robust and prevent errors such as those we're seeing with the PHP 8.1 migration (T379874). Unfortunately, MediaWiki code is considerably far from this level of strictness, and for each of the issue types above we have hundreds of issues in MW core alone. Some possible solutions and patterns that stand out:

  • Documenting array shapes more thoroughly (phan can also enforce this with UnknownElementTypePlugin, but it is admittedly quite annoying for arrays of scalars).
  • Being stricter in documentation. For example, query methods in IDatabase can return false if there's an error and the internal QUERY_SILENCE_ERRORS flag is set, but we typically ignore this fact in 99.9% of our code. This would not pass strict checks.
  • A super common pattern: methods that return a nullable type (e.g., Title|null) where some callers use the return value without checking for null. If it can be null, it should be handled (potentially with an assertion). If it (generally) can't, either make the return value not nullable, or replace nullability with a checked exception (or a separate LBYL method + an unchecked exception).
  • Class properties with nullable type where the property is not initialized in the constructor, but rather in a separate initialization method that needs to be called before using the property. I find these confusing regardless of tools. Could use a getter with an assertion.

Details

Related Changes in Gerrit:

Event Timeline

For the specific case of QUERY_SILENCE_ERRORS, it seems to be only used inside the rdbms lib according to codesearch. Maybe it could be outright removed or at least turned into a private error-suppressing method, which would simplify things a lot for external callers—especially since things like "this can only return false if a specific argument was passed" are not something phan can statically verify.

For the specific case of QUERY_SILENCE_ERRORS, it seems to be only used inside the rdbms lib according to codesearch. Maybe it could be outright removed or at least turned into a private error-suppressing method,

Yup, the "private error-suppressing method" approach is what I had in mind. I'm just not sure if that would be enough for the existing use cases. Maybe?

For the specific case of QUERY_SILENCE_ERRORS, it seems to be only used inside the rdbms lib according to codesearch.

Please note that codesearch may not find everything, as there are multiple ways to specify this flag:

  • ISQLPlatform::QUERY_SILENCE_ERRORS – this is trivial to find
  • SLQPlatform::QUERY_SILENCE_ERRORS or any other implementing class – this isn’t hard to find either
  • 1, 3 etc. – this shouldn’t be done, but if it is, it’s nearly impossible to find
  • true – a comment specifically notes that the value 1 was chosen because (int)true is 1, and this is needed for backward compat – this is also nearly impossible to find, but it’s actually possible to happen (even if it’s been replaced by flags back in 2019 in 108fd8b18c1084de7af0bf05831ee9360f595c96 and Phan has been rejecting it since then, probably not all extensions use Phan).

And I couldn’t find anything that would say this flag is internal, so it likely needs to go through the deprecation procedure if we want to remove it from the publicly used interface (which I support, just ask you to be careful).


What about first enabling these options in smaller extensions, which are easier to fix? Of course, without core having them, they cannot help with code interfacing with core, but at least it would be used somewhere, and internal code of those extensions would be safer.

  • 1, 3 etc. – this shouldn’t be done, but if it is, it’s nearly impossible to find

I wouldn't feel sorry for breaking such code, to be honest...

And I couldn’t find anything that would say this flag is internal, so it likely needs to go through the deprecation procedure if we want to remove it from the publicly used interface (which I support, just ask you to be careful).

I seem to recall that it is de facto internal, but I guess it's possible that this isn't documented (I haven't checked).

What about first enabling these options in smaller extensions, which are easier to fix?

I tried a while ago in CampaignEvents, that's relatively strongly typed and not too large, and I was greeted with lots of errors due to core interfaces. It's basically impossible to enable it for extensions if we don't fix at least some of core's public interfaces first.

  • 1, 3 etc. – this shouldn’t be done, but if it is, it’s nearly impossible to find

I wouldn't feel sorry for breaking such code, to be honest...

Well, if this was an otherwise clearly internal interface (and didn’t have the history of being previously a boolean), I wouldn’t feel very sorry either. However, with all these factors added together, I think the deprecation process is warranted. (By the way, maybe this should be a child task, so that MediaWiki-libs-Rdbms can be tagged.)

I tried a while ago in CampaignEvents, that's relatively strongly typed and not too large, and I was greeted with lots of errors due to core interfaces. It's basically impossible to enable it for extensions if we don't fix at least some of core's public interfaces first.

Was that so bad that it couldn’t have been fixed in core? If only the portion used by that extension needs to be fixed in core, that’s probably a much smaller core patch.

Change #1115506 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/docroot@master] build: make Phan stricter

https://gerrit.wikimedia.org/r/1115506

Change #1115506 merged by jenkins-bot:

[integration/docroot@master] build: make Phan stricter

https://gerrit.wikimedia.org/r/1115506

Gerrit #1115506 is an example for turning Phan strictness. On that code base, most of the "issues" were due to not handling PHP functions returning false on error. That is strict_return_checking.