Page MenuHomePhabricator

Formalize exception handling and annotations in MediaWiki and enforce it in CI
Open, Needs TriagePublic

Description

The coding conventions say very little about exception handling in MediaWiki. In fact, the only relevant paragraph is:

Exceptions that indicate programming errors should be one of the exceptions that ship with PHP or a more specific subclass, while exceptions that indicate errors that are relevant to the end user should be an ErrorPageError or one of its subclasses.

Effectively, this means that exception handling is done differently in different areas or codebases, depending on who wrote or approved certain code changes. In particular, I would like to focus on two aspects:

  • What exception classes should be thrown
  • When and how to use @throws annotations

Over the past years, I've seen many commits introducing code that throws or handles exceptions, and the general consensus amongst developers (which actually goes beyond MediaWiki) is that:

  • You should never throw the base Exception class; instead, either use another exception available in the SPL, or define a new exception type.[1]
  • SPL exceptions (like InvalidArgumentException or UnexpectedValueException) are generally not meant to be caught.
  • @throws annotations should be reserved for checked exceptions, that is, exceptions that callers should either catch or document in their own doc comment.

Additionally, there seems to be some uncertainty as to when MWException should be used (either directly or as a base class) [2], whether it should be caught, etc. All in all, this produced inconsistent code based on varying assumptions, and such code cannot be validated statically (at least for what concerns exception handling).

Therefore, I am proposing that:

  • We agree on how exception handling should work in MediaWiki, potentially using the rules above (de facto, the status quo) as a starting point.
  • The new rules should be documented in the coding conventions, in a dedicated section about exception handling.
  • These rules should be enforced by linters (PHPCS) or static analysis tools (phan).
  • Possibly, the purpose of MWException should be documented and clarified, both in the class comment and in the coding conventions.

In particular, I think the linters should do the following:

  • Flag code that throws Exception (this can be done with PHPCS)
  • Ensure that if a callee has @throws annotations, the caller either catches those exceptions, or documents them in its own doc comment (phan can do this with the warn_about_undocumented_exceptions_thrown_by_invoked_functions config option)
  • Possibly: flag code that throws a checked exception but does not document it in the doc comment. Phan can do this with the warn_about_undocumented_throw_statements option; the problem is that phan doesn't know which exceptions are (un)checked. The config option exception_classes_with_optional_throws_phpdoc can be used to specify that, but I'm not sure if the current exception hierarchy distinguishes checked vs unchecked exceptions clearly enough for this.

I imagine a lot of code might have to be cleaned up, so the proposal is that we enable the new rules by default, but disable them on repos where they fail. They can then be re-enabled incrementally as noncompliant code is rewritten. Related: T240672: Clean up unchecked exceptions.


[1] A good question here would be if there are other exception classes that should not be thrown directly. Opinions about this seem to vary.
[2] As a matter of fact, the class name does not explain its purpose. The doc comment of that class only says "MediaWiki exception". I would say it's not incredibly helpful, is it?

Event Timeline

This seems good. Is this something that should go through TDMP?

This seems good. Is this something that should go through TDMP?

Good question, I'm not familiar with the process.


Also, here are examples of compliant and noncompliant code:

throw new Exception( 'Somethin bad happened' ); // Noncompliant: should use a more specific exception type. NOTE: make sure "Exception" is not an alias of something else...

function test() {
   throw new SomeCheckedException(); // Noncompliant: exception is not documented in the doc comment
}

/**
 * @throws SomeCheckedException
 */
function test2() {
   throw new SomeCheckedException(); // Compliant
}

function test3() {
   test2(); // Noncompliant: should either catch the exception or document it
}

function test4() {
   try {
      test2(); // Compliant
   } catch ( SomeCheckedException $e ) {
      //...
   }
}

/**
 * @throws SomeCheckedException
 */
function test4() {
   test2(); // Compliant
}

try {
   doSomething();
} catch ( InvalidArgumentException $e ) { // Noncompliant, this exception is not meant to be caught.
}

And finally, here's the output of phan on MediaWiki core with warn_about_undocumented_exceptions_thrown_by_invoked_functions enabled: P36582. As you can see, it catches a lot of things, and this is without warn_about_undocumented_throw_statements. We wouldn't have to fix all of them immediately, of course.

Oh, and something I forgot to say. Another question would be how to document unchecked exception. For instance, if you have a method that throws InvalidArgumentException if given something invalid. Presumably, you'd want callers to know that the exception can be thrown without reading the implementation, even if the exception shouldn't be caught. @throws InvalidArgumentException would be disallowed in that case. So we should perhaps agree on a different way of documenting that.

In fact, it'd be great if we could get a linter (probably phan) to warn if @throws is used with unchecked exception types. I don't think this is currently possible, but I can request this feature upstream.

@Daimona wrote in the task description:

Additionally, there seems to be some uncertainty as to when MWException should be used.

The main value of MWException was to have a skinned and localised "Internal error" page rendered via our exception handler. I don't recall if it was originally introduced for that, but that's effectively what it ended up as. Over the past two decades, the PHP ecosystem (both built-ins and vendor libraries) have moved from warnings or custom callbacks towards proper exceptions. This meant an increasing scope of uncaught exceptions would return in near-blank page errors instead of "pretty" ones.

Since about 5 years now, the MException code has moved out of there and into the general exception handler (now named MWExceptionRenderer), and invoked by both the runtime exception handler (and entry point try-catch). I recall a drive from @ori, myself, and others (ref T86704) to remove usage of MWException. It was never useful in catch conditions (and afaik has no significant use as such), and for throwing or extending, I think we can use other exceptions instead. MWException isn't a useful middleground I think.

Traditionally, the unchecked exceptions are the subclasses of RuntimeException or LogicException (which includes all SPL exceptions). E.g. PHPStorm has that assumption built in.

Another thing that should be mentioned in the documentation is ILocalizedException and INormalizedException.

@Daimona wrote in the task description:

Additionally, there seems to be some uncertainty as to when MWException should be used.

The main value of MWException was to have a skinned and localised "Internal error" page rendered via our exception handler. I don't recall if it was originally introduced for that, but that's effectively what it ended up as. Over the past two decades, the PHP ecosystem (both built-ins and vendor libraries) have moved from warnings or custom callbacks towards proper exceptions. This meant an increasing scope of uncaught exceptions would return in near-blank page errors instead of "pretty" ones.

Since about 5 years now, the MException code has moved out of there and into the general exception handler (now named MWExceptionRenderer), and invoked by both the runtime exception handler (and entry point try-catch). I recall a drive from @ori, myself, and others (ref T86704) to remove usage of MWException. It was never useful in catch conditions (and afaik has no significant use as such), and for throwing or extending, I think we can use other exceptions instead. MWException isn't a useful middleground I think.

Thanks for the useful context; I didn't realize we already had a task about it. In this case, I think we don't have to worry about MWException for this task, and talk about its future in T86704 instead.

Traditionally, the unchecked exceptions are the subclasses of RuntimeException or LogicException (which includes all SPL exceptions). E.g. PHPStorm has that assumption built in.

I think that's a sensible default in general. I was just wondering if that assumption would hold true in MediaWiki, i.e.: do we have any checked exception that inherits from RuntimeException or LogicException? Also, since the default makes sense, we could simply just flag usages of (subclasses of) Runtime and LogicException with phan.

Another thing that should be mentioned in the documentation is ILocalizedException and INormalizedException.

I know pretty much nothing about them, so yes, it might be a good idea to document their usage :)


All in all, my proposal has become:

  • Add the following to a dedicated section of the PHP coding conventions:
    • Never throw the base Exception class; instead, either use another exception available in the SPL, or define a new exception type.
    • (Subclasses of) RuntimeException and LogicException represent unchecked exceptions, so they should not be caught, nor documented with @throws
    • @throws annotations should be reserved for checked exceptions, that is, exceptions that callers should either catch or document in their own doc comment with @throws.
    • TBD: documentation of unchecked exceptions, see below
  • These rules would be enforced by linters as follows:
    • Flag code that throws Exception; possibly with PHPCS, but watch out for alias and namespaces. Otherwise, use phan.
    • Flag @throws annotations used with unchecked exceptions (subclasses of RuntimeException and LogicException); should be done with phan, but this feature doesn't seem to exist. I can ask upstream and perhaps write a PR.
    • Ensure that if a callee has @throws annotations, the caller either catches those exceptions, or documents them in its own doc comment (phan's warn_about_undocumented_exceptions_thrown_by_invoked_functions)
    • Flag code that throws a checked exception but does not document it in the doc comment (phan's exception_classes_with_optional_throws_phpdoc)
  • These rules would be enabled by default for phan/phpcs, but would initially be disabled on repos with violations. As those violations are fixed, the rules would be incrementally enabled.

For me, the big question is how to document unchecked exceptions (T321683#8346466). @throws is not an option. I don't think the manual has to prescribe a specific syntax, but maybe it could suggest one. For instance, something like @note This throws XYZException if .... I'd really like to hear more thoughts about this, though.

[…] Another question would be how to document unchecked exception. […] @throws InvalidArgumentException would be disallowed in that case. So we should perhaps agree on a different way of documenting that.

[…]
For me, the big question is how to document unchecked exceptions […]. For instance, something like @note This throws XYZException if ....

I thought that, implied in disallowing documententation for @throws about "must not catch" exceptions, that we don't want to document them. What's the use case for documenting this — in a different way? Who is the intended audience?

Over the past year, myself and overs have removed countless of such annotations without replacement. I generally found them needlessly repeating an implementation detail that can be gleaned from the code right below it already. Noting that of course, we're talking specifically about exceptions that aren't considered part of the stable API, and aren't meant to be caught or cared for by callers. I think in a handful of cases something useful was written after the @throws line, in which case I copied it to the method description or the relevant parameter description. E.g. if it was describing the required type or format of a parameter.

[…] Another question would be how to document unchecked exception. […] @throws InvalidArgumentException would be disallowed in that case. So we should perhaps agree on a different way of documenting that.

[…]
For me, the big question is how to document unchecked exceptions […]. For instance, something like @note This throws XYZException if ....

I thought that, implied in disallowing documententation for @throws about "must not catch" exceptions, that we don't want to document them. What's the use case for documenting this — in a different way? Who is the intended audience?

It would not be about documenting the exception type, but the fact that some preconditions are truly checked and not just assumed to be true. Especially in old code, it's not hard to find documented preconditions like "argument X should Y" that are not actually checked in the code. Old-style code may then ignore the preconditions because "it'll work anyway". By clearly stating that the preconditions are enforced (by throwing an exception), callers would know that they really need to validate the arguments. That said:

Over the past year, myself and overs have removed countless of such annotations without replacement. I generally found them needlessly repeating an implementation detail that can be gleaned from the code right below it already. Noting that of course, we're talking specifically about exceptions that aren't considered part of the stable API, and aren't meant to be caught or cared for by callers.

I do agree that it's an implementation detail, and that in a perfect world, there'd be no need to document it.

I think in a handful of cases something useful was written after the @throws line, in which case I copied it to the method description or the relevant parameter description. E.g. if it was describing the required type or format of a parameter.

Oh and yes, I'm talking specifically about cases where the @throws annotation contains something useful. In fact, I think the only reason why you'd want to document an unchecked exception is to clearly state that «an exception will be thrown if X». The type of the exception should not be relevant, only the fact that the exception is thrown is relevant.

I think it'd be great if the documentation said something about this, along the lines of:

When needed, unchecked exceptions should be documented in the method or param description, and not with @throws.

Clarifying what "needed" means, and perhaps providing an example.

I've just added a section to the coding conventions page. Improvements are welcome! I will look into enforcing these rules with phpcs/phan.

One thing that came out while reviewing r850537 is that phan does not inherit @throws annotations in any way, so I've filed https://github.com/phan/phan/issues/4757 to see if it's possible to add this feature upstream. Otherwise we would have to add @throws annotations in most special pages and API modules.

Change 926828 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan@master] Add plugin to disallow use of `new Exception`

https://gerrit.wikimedia.org/r/926828

On second thought, I believe implementing a rule against new Exception is a task for phan, as we need some understanding of the context of the new statement (e.g., to distinguish the base Exception and a namespaced class that is also called Exception).

And this is partly blocked on upstream issue https://github.com/phan/phan/issues/4757 for the other phan checks.

Change 940511 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] HTMLFileCache: Remove `@throws` for unchecked exception

https://gerrit.wikimedia.org/r/940511

Change 940511 merged by jenkins-bot:

[mediawiki/core@master] HTMLFileCache: Remove `@throws` for unchecked exception

https://gerrit.wikimedia.org/r/940511

And this is partly blocked on upstream issue https://github.com/phan/phan/issues/4757 for the other phan checks.

Released as 5.4.3 yesterday - https://github.com/phan/phan/releases/tag/5.4.3

Change 989246 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Bump phan/phan to 5.4.3

https://gerrit.wikimedia.org/r/989246

Change 989247 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/tools/phan@master] [WIP] Bump phan/phan to 5.4.3

https://gerrit.wikimedia.org/r/989247

Change 989246 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Bump phan/phan to 5.4.3

https://gerrit.wikimedia.org/r/989246

Change 989247 abandoned by Jforrester:

[mediawiki/tools/phan@master] [WIP] Bump phan/phan to 5.4.3

Reason:

Let's go with I9a867a52b9fb5dee1a130dbef4cf076ec3ae09f0 instead.

https://gerrit.wikimedia.org/r/989247