Page MenuHomePhabricator

Prohibit use of 'object' as type in docblocks
Closed, ResolvedPublic

Description

I propose that the 'object' type be prohibited via a PHPCS sniff in docblocks, when used with annotations like @param, @return, @property etc. Rationale:

  • A common mistake is to confuse object with stdClass. However, the former accepts *any* object (i.e. anything that is not scalar and not array), while the latter only accepts what people may call "objects", i.e. stdClass instances
  • object as a documented type, when it's not used for stdClass, is very unhelpful, because it doesn't convey any information about the parameter. Knowing that it's not a scalar will not save me from reading the method body to figure out the type
  • In particular, the docblock should always explicitly mention what types are expected by the method. Again, object is not really different from a complete lack of documentation.
  • However, object can still be allowed as a typehint, for situations like the following, assuming that a vague typehint is better than no typehint. Note, I wouldn't encourage this practice: it might make more sense to specialize the method, or split it in two, or create a base interface. But I think it shouldn't be forbidden either.
/**
 * @param TypeA|TypeB $foo
 */
function foo ( object $foo ) {
}

As always, the warning can be suppressed in "special cases" where using object is legit. I don't have any examples for this. OTOH, codesearch will give you many examples where object is used in place of stdClass.

A caveat: make sure that PHPCS won't complain about typehint not matching docblock in the example above.

Event Timeline

Sounds good, I would also say that autofix seems bad here, because it is not always clear that object and stdClass are the same here

Sounds good, I would also say that autofix seems bad here, because it is not always clear that object and stdClass are the same here

Ah, right! I agree, we should not provide an autofix.

Change 632993 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[mediawiki/tools/codesniffer@master] Expand FunctionCommentSniff to error on object typehints

https://gerrit.wikimedia.org/r/632993

To be honest I can't fully follow the arguments given in the task description. I have seen examples like the mentioned TypeA|TypeB several times, even wrote them myself. What's the problem we are trying to solve here? Sure, it is true: In many cases a type hint should not be object, but stdClass. But this doesn't make object wrong. It's a bit like saying a method expects an int without specifying if it accepts negative integers, or not. It's just a little vague. That's true for a lot of the documentation we write. string doesn't mention if the string can be empty. double doesn't mention if NAN is included or not. User doesn't mention if anonymous users are included or not. And object is as vague, failing to mention if it's limited to stdClass objects, or allows more than that. Sure, this can probably be improved in many situations. But it's not always wrong. So why prohibit it?

Can you please update this task's title so it mentions the motivation? Why do we need to stop people from using object?

To be honest I can't fully follow the arguments given in the task description. I have seen examples like the mentioned TypeA|TypeB several times, even wrote them myself.

Can you please link some of these examples? I looked at codesearch before opening this task, and the majority of the @param obect occurrences actually meant stdClass. For instance, compare this search (note: it exclude things like @param object ClassName, which are surprisingly common) with this one: 118 out of 305 files are almost certainly using object instead of stdClass. I'd bet that most of the remaining ones are wrong, too, although one would have to check them one by one. I did this on some of them, and could only find a couple of places specifically working with abstract objects.

Sure, it is true: In many cases a type hint should not be object, but stdClass. But this doesn't make object wrong. It's a bit like saying a method expects an int without specifying if it accepts negative integers, or not. It's just a little vague. That's true for a lot of the documentation we write. string doesn't mention if the string can be empty. double doesn't mention if NAN is included or not. User doesn't mention if anonymous users are included or not.

Watch out, this is close to a strawman, because there's a substantial difference between object/stdClass and the other examples you mentioned: for object, we have an obvious replacement that is stdClass; but we don't have a "standard" type that we can use to represent negative integers, or empty strings, or NAN, or anonymous user. That can only be done by adding a comment to the parameter. Of course, the comment can also be added for object meaning stdClass, BUT in this case, we can avoid the comment and just replace the type with the correct one.

And object is as vague, failing to mention if it's limited to stdClass objects, or allows more than that.

Exactly. In one word: ambiguous.

Sure, this can probably be improved in many situations. But it's not always wrong.

Right. Part of my motivation is that the "correct" use cases for object are very rare, and in particular rare enough that suppressing the warning might be the best choice (i.e. the few use cases are not enough to justify the many imprecise uses).

Can you please update this task's title so it mentions the motivation? Why do we need to stop people from using object?

The reasons are the ones mentioned above and in the description, with an important remark: stdClass makes static analysis more precise and less error-prone. Take the following example:

/**
 * @param object $row
 */
function foo( $row ) {
  echo $row->my_field;
}

$obj = new SomeClass;
foo( $obj ); // Oh noes! Static analysis will not tell me this is wrong.

this is a bit simplified, but I hope it still conveys the idea of a bug that might be easily avoided.

Also, a mostly irrelevant comment: some of the reasons against object can also be referred to mixed, but there are some important differences:

  • There won't be a mixed typehint until PHP8 (https://wiki.php.net/rfc/mixed_type_v2)
  • Using mixed is more common, since the language is/was loosely typed (w.r.t scalars)
  • I think mixed is used quite often with the meaning of "scalar", the alternative being to specify all (common) scalar types each time (string|int|float|bool|null)

I realized I misunderstood this task. This is not about disallowing object as a strict type hint in code, but only about PHPDoc blocks. This makes a lot of what I said before invalid. I'm sorry for the confusion.

I was thinking of code that contains something like @param ClassA|ClassB where the two classes don't share a common base class. Using a strict object type hint in the function header is helpful in this case.

I agree that @param object is odd. I can't think of many cases where this makes sense:

  • When the code using the object uses it as if it is an stdClass object (i.e. only accessing arbitrary properties), the @param tag should just say that.
  • When the code does method calls on the object, it must be possibly to list the valid class names.
  • Sometimes both happens, e.g. when an arbitrary property is used to cache something in the Parser object. The type hint should still mention the class then, but neither object nor stdClass.
  • I can think of code that implements a factory, cache, facade, or any kind of indirection where a huge number of classes should pass through, possibly including classes that don't yet exist when this code is written. There are a few options then: 1. List all classes, and update it every time a new one is added. 2. Make the classes share a common interface. 3. Don't have any type hint (i.e. use mixed). 4. Use object. Note this is not the same as not having any documentation. For example, @return object tells me a method will never return null.
  • Code that is meant to accept anonymous classes. new class {} is not an instanceof stdClass. The only possibly type hint is object then, I believe.
  • I'm one of many users that try to improve code they don't own. One of our never ending tasks is adding missing documentation. Sometimes it's really hard to fully understand a piece of code. Still we want to improve it. In such cases it might be helpful to document something as object, instead of making a possibly wrong guess. At least as a step, in the hope a future patch will improve the documentation even more.

I'm not sure about the last few cases. Even if there are only "a couple of places" where there is no better way than documenting something as object, it would feel odd to have to disable this sniff then. Do we have any other sniff that is known to report correct usages?

I realized I misunderstood this task. This is not about disallowing object as a strict type hint in code, but only about PHPDoc blocks. This makes a lot of what I said before invalid. I'm sorry for the confusion.

Ah, I'm sorry, I should have seen this before, since you explicitly mentioned typehints.

I was thinking of code that contains something like @param ClassA|ClassB where the two classes don't share a common base class. Using a strict object type hint in the function header is helpful in this case.

Yes, as I said in the task description, for that case object is certainly better than nothing. Sometimes it might still make sense to fix it somehow (as mentioned in the description), but we certainly shouldn't disallow this use case.

I agree that @param object is odd. I can't think of many cases where this makes sense:

  • When the code using the object uses it as if it is an stdClass object (i.e. only accessing arbitrary properties), the @param tag should just say that.
  • When the code does method calls on the object, it must be possibly to list the valid class names.
  • Sometimes both happens, e.g. when an arbitrary property is used to cache something in the Parser object. The type hint should still mention the class then, but neither object nor stdClass.

I agree 100% with this analysis.

  • I can think of code that implements a factory, cache, facade, or any kind of indirection where a huge number of classes should pass through, possibly including classes that don't yet exist when this code is written. There are a few options then: 1. List all classes, and update it every time a new one is added. 2. Make the classes share a common interface. 3. Don't have any type hint (i.e. use mixed). 4. Use object. Note this is not the same as not having any documentation. For example, @return object tells me a method will never return null.

Exactly. The only use cases I can think of are ObjectFactory and StubObject, but that's just two classes after all. And I also agree with the cases breakdown.

  • Code that is meant to accept anonymous classes. new class {} is not an instanceof stdClass. The only possibly type hint is object then, I believe.

This is an interesting example that I didn't think of. I searched a bit on the internet, and apparently there's no standard way to document this case. This is probably something that should be addressed upstream, but in the meanwhile I'd say this is an accepted use case for object. Note, however, that this is presumably a very rare case. You'd need not only to deal with anonymous classes (which is usually unnecessary), but also with anonymous classes *that do not extend any class, nor implement any interface*. I cannot see any obvious use case for such a generic construct.

  • I'm one of many users that try to improve code they don't own. One of our never ending tasks is adding missing documentation. Sometimes it's really hard to fully understand a piece of code. Still we want to improve it. In such cases it might be helpful to document something as object, instead of making a possibly wrong guess. At least as a step, in the hope a future patch will improve the documentation even more.

An alternative to suppressing the issue here would be to add an object typehint, without a @param annotation. This would still be an improvement, and wouldn't require immediately searching what the type is meant to be precisely (which can be done at a later time).

I'm not sure about the last few cases. Even if there are only "a couple of places" where there is no better way than documenting something as object, it would feel odd to have to disable this sniff then. Do we have any other sniff that is known to report correct usages?

There are two interesting questions here. The first one is: how many pieces of code would need to suppress the issue? Because if it's like 10 places out of everything that's on codesearch, this wouldn't be bad. As for your question, it highly depends on what a "correct usage" is. There are some instances where the sniff might suggest a subpar solution (e.g. in some while loops it might make sense to use an assignment in the loop condition). However, if we want to make a strict distinction of correct vs incorrect, I cannot really think of any example, especially if we're only interested in sniffs that "prevent correctness" intendedly (and not as a result of some other limitation, e.g. T204210). Perhaps we might get some insights by looking at the list of suppressed issues for each repo. I can remind of a list of such suppressions, perhaps maintained by LibUp, but cannot find it right now.

it highly depends on what a "correct usage" is.

My argument is that object is never wrong.

Hm, I guess this is the same for most, if not all other sniffs. For example, missing documentation doesn't make the code wrong – still we have a sniff that complains about it.

You'd need […] to deal […] with anonymous classes that do not extend any class, nor implement any interface.

Uh, that's a lovely argument. The use-case I was thinking of is to use an anonymous class as a test mock or spy. And indeed, it should be possible to make it implement an interface, which means there is no need for object.

One of our never ending tasks is adding missing documentation. […] it might be helpful to document something as object […]

With anonymous classes gone, this is my last remaining argument. I realized it's easy to solve: I can simply <exclude> this sniff as long as there is documentation in a codebase that mentions object.

it highly depends on what a "correct usage" is.

My argument is that object is never wrong.

Hm, I guess this is the same for most, if not all other sniffs. For example, missing documentation doesn't make the code wrong – still we have a sniff that complains about it.

Yeah, this is exactly what I meant.

One of our never ending tasks is adding missing documentation. […] it might be helpful to document something as object […]

With anonymous classes gone, this is my last remaining argument. I realized it's easy to solve: I can simply <exclude> this sniff as long as there is documentation in a codebase that mentions object.

Yes, that's a good approach. Alternatively, to expand on my previous comment: the example you mentioned is based on the assumption that the docblocks for a given class are missing or incomplete (and thus, the sniffs that check for docblock completeness are already disabled). In this situation, it's possible to avoid adding the object parameter without having to suppress any additional sniff. At the same time, it's still possible to improve the method by adding object as a typehint -- that is, temporarily, until someone wants to investigate what the possible types are and replace it with something more appropriate.

Change 632993 merged by jenkins-bot:
[mediawiki/tools/codesniffer@master] Expand FunctionCommentSniff to error on object typehints

https://gerrit.wikimedia.org/r/632993

DannyS712 claimed this task.
DannyS712 removed a project: Patch-For-Review.

Should the same apply to object[]?

Should the same apply to object[]?

Probably

Change 636964 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[mediawiki/tools/codesniffer@master] Expand FunctionCommentSniff to error on object[] typehints

https://gerrit.wikimedia.org/r/636964

Change 636964 merged by jenkins-bot:
[mediawiki/tools/codesniffer@master] Expand FunctionCommentSniff to error on object[] typehints

https://gerrit.wikimedia.org/r/636964

Should the same apply to object[]?

Huh. Perhaps it would be a good idea to adopt some kind of tokenization of phpdoc types and run our checks on that, so we only have to worry about atomic units. But this is just me dreaming. Well done!

EDIT: I'm saying this because, even if I didn't check deeply, I'm fairly sure that we're still leaving out edge cases like (string|object)[].

Change 637591 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/tools/codesniffer@master] Release v33.0.0

https://gerrit.wikimedia.org/r/637591

Change 637591 merged by jenkins-bot:
[mediawiki/tools/codesniffer@master] Release v33.0.0

https://gerrit.wikimedia.org/r/637591