Page MenuHomePhabricator

Enable sniff to disallow mixing different boolean operators in the same expression without parentheses
Closed, ResolvedPublic

Assigned To
Authored By
Daimona
Mar 2 2024, 10:27 PM
Referenced Files
None
Tokens
"Dislike" token, awarded by Lucas_Werkmeister_WMDE."Like" token, awarded by Dringsim."Like" token, awarded by matej_suchanek.

Description

The description of this task has to begin with a confession; take it as a user story, if you want. If I find code like the following (source):

if ( !$attribs['rc_patrolled']
	&& ( $useRCPatrol || $attribs['rc_type'] == RC_NEW && $useNPPatrol ) // <-- relevant line
) {
	$flag .= '!';
}

I don't know what it does. I just do not remember what the operator precedence is; maybe I never even learned it. Is it || before && or the other way around? Whenever I find such code, I generally do one of two things: 1) go check the manual to see what the order is (and promptly forget it afterwards), or 2) trust the author of the code to know operator precedence better than me, and hope that the code does what it looks like it was meant to do (which is probably the case if I'm looking at code written in 2004). Clearly, this is subpar: I have to spend a decent amount of time parsing the code in question. But then I think, maybe it's just me. After all, there are so many things that I have to re-learn often because I forget them immediately.

But now that PHPCS 3.9.0 is out and it introduces a new Generic.CodeAnalysis.RequireExplicitBooleanOperatorPrecedence sniff, I think it's time for me to actually ask this and see if it's just me--or actually, ask: would it be a problem if we enabled this new rule to forbid mixing boolean operators without parentheses? To clarify, the code above would be flagged, and the accepted way of rewriting it (no autofix available) would be:

if ( !$attribs['rc_patrolled']
	&& ( $useRCPatrol || ( $attribs['rc_type'] == RC_NEW && $useNPPatrol ) )
) {
	$flag .= '!';
}

where the precedence is made very clear by the parentheses. I think the extra parentheses are well worth the clarity, as well as the reassurance that the code is doing exactly what we want it to do. Still, there are a few occurrences of this pattern (at least in MW core, see patch attached to this task) and we don't have a sniff for it, so I'm wondering if I'm missing some potential downsides.

Event Timeline

Change 1008057 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/codesniffer@master] Add Generic.CodeAnalysis.RequireExplicitBooleanOperatorPrecedence

https://gerrit.wikimedia.org/r/1008057

Change 1008057 merged by jenkins-bot:

[mediawiki/tools/codesniffer@master] Add Generic.CodeAnalysis.RequireExplicitBooleanOperatorPrecedence

https://gerrit.wikimedia.org/r/1008057

Daimona claimed this task.

Since I never learned the precedence rules, for me and anyone else reviewing patches that fix this: && is higher precedence than || per https://www.php.net/manual/en/language.operators.precedence.php

The boolean algebra is similiar to what happens in arithmetic math

&& is like multiplication and || like addition, whenever a factor of the multiplication is 0, the product/result is 0 (false), while for the addition a 0 does not impact the sum/result.
Also multiplication is evaluated before addition.

https://en.wikipedia.org/wiki/PEMDAS

The boolean algebra is similiar to what happens in arithmetic math

&& is like multiplication and || like addition, whenever a factor of the multiplication is 0, the product/result is 0 (false), while for the addition a 0 does not impact the sum/result.
Also multiplication is evaluated before addition.

https://en.wikipedia.org/wiki/PEMDAS

I think in general it is better to ask what is the idea behind the rule. Or to improve the code that is hard to understand on a case by case basis. I am regularly confused if more than two brackets are used. Adding new lines helps in many cases, but adding new rules might make things worse in other cases.

The boolean algebra is similiar to what happens in arithmetic math

Right, I guess that's one way to remember it. I guess personally, I don't tend to do this conversion naturally when I read code.

Or to improve the code that is hard to understand on a case by case basis. I am regularly confused if more than two brackets are used.

I think the autofixer isn't meant to be perfect; it's not even guaranteed that the code author intended the code to behave as it does. Fixing this manually is always a good idea in this cases, and rules like this one might highlight code that would actually benefit from being refactored to avoid long conditionals in the first place.

I think "Programs must be written for people to read, and only incidentally for machines to execute." is an excellent guiding principle for code reviews. Today, it is pretty standard that one uses tools like Grammarly who suggest changes to text (for people). Still, I also think one should not enforce too many Wikimedia-specific house rules to make it as easy as possible for other open-source projects to contribute.

I wholeheartedly agree, which is why I proposed this change. Operator precedence is something that I just can't remember naturally (like many other dichotomies); for me, unparenthesized expressions like these make the code harder to read (or trust).

Still, I also think one should not enforce too many Wikimedia-specific house rules to make it as easy as possible for other open-source projects to contribute.

I don't think this is quite Wikimedia-specific, though. In fact, this sniff was accepted in upstream codesniffer. As another example, the PHP Inspections (EA Extended) plugin for PHPStorm also highlights unparenthesized expressions mixing && and || by default, with the maximum error severity. Again, these are obviously just examples, but it's just to say, I don't think it would be just Wikimedia.

@Daimona, while risking getting a bit off-topic, let me add one more comment: I think it would be better to adopt well-established rules instead of switching on or off inspections on a case-by-case basis. For example, if you have reserved one hour a week to work on Wikimedia-related activities, spending half an hour dealing with this new rule is quite a significant investment of time. Maybe I also overestimate the impact of that change. So maybe a good practice would be to accept all sniffs from the upstream code-sniffer and lower the warning level for the new rules that create problems. The change https://gerrit.wikimedia.org/r/c/mediawiki/tools/codesniffer/+/1008057 suggests that we act on a case-by-case basis and also I could not find in the documentation what upstream sniffer ideas we follow.

@Daimona, while risking getting a bit off-topic, let me add one more comment: I think it would be better to adopt well-established rules instead of switching on or off inspections on a case-by-case basis.

Adhering to an existing coding standard (like PSR-12) might be a noble goal, but I'm sure there would be many issues with it and it's probably not going to happen... OTOH, given the size (and prominence in the PHP world) of MediaWiki, I think a custom coding standard is not unreasonable. The concept shouldn't be new if someone is familiar with WordPress Coding Standards, or Slevomat Coding Standard, or Magento Coding Standard, or CakePHP Code Sniffer, to name some of the most used dependents of PHPCS.

For example, if you have reserved one hour a week to work on Wikimedia-related activities, spending half an hour dealing with this new rule is quite a significant investment of time. Maybe I also overestimate the impact of that change.

The way we mitigate that is to always provide an autofixer when possible, and upstream does the same. Ideally, a developer could write the code in whatever coding style they prefer, and then run PHPCBF so that it gets autofixed to comply with the MW standards[1]. Clearly, this isn't a viable option if someone spends a lot of time on MW, in which case it would probably just be easier to get used to the MW coding standards, but it should be fine for an occasional contributor. Certain rules can also be excluded (or included) for individual code bases, if the maintainers wish to, although that comes at the price of being (potentially only slightly) inconsistent with the overarching MW standards.


[1] - Note, this doesn't work for everything, but it can get pretty close.

Thank you. I am convinced eventually, and I tend to run composer fix before each commit :-) Maybe the documentation could be reordered according to the structure of the PSR-12. Even if we have another convention for some aspects like 2.4, use tabs, not spaces.