Page MenuHomePhabricator

Enforce `use function` import for array_key_exists() (and other common PHP functions?) in namespaced files via PHPCS
Closed, DuplicatePublic

Description

What:
MediaWiki Codesniffer should enforce that files which are namespaced and use array_key_exists() include the following import near the top of the file:

use function array_key_exists;

Why:
Without this, a call to array_key_exists() within a file in e.g. namespace MediaWiki\Config is ambiguous between \MediaWiki\Config\array_key_exists(), \MediaWiki\array_key_exists() and \array_key_exists(); in theory PHP has to check each time the function is called whether the former two functions exist before calling the latter, built-in function, as the namespaced functions may be defined at any time during program execution. (I don’t know if PHP does something smart like tracking an assumption that the namespaced functions don’t exist, and invalidating that assumption when they’re defined; my guess is it doesn’t.) By declaring that we mean the unnamespaced function at the beginning of the file, we skip this lookup everywhere. (This is true for any PHP function we call from namespaced files.)

Furthermore, array_key_exists() in particular has an opcode (ZEND_ARRAY_KEY_EXISTS; merged commit 1, 2) that can be saved in the opcache instead of a generic function call instruction, further optimizing the call; this is also only possible if it’s known at compile time (syntactically) that the call must refer to the unnamespaced, built-in PHP function.

See also SiteConfiguration: Use \array_key_exists() and SiteConfiguration: Use 'use function' syntax for more discussion.

Open questions:

Which functions should we enforce this for? zend_try_compile_special_func_ex has a list of special-cased functions (currently 15), from strlen() to clone(); all of those? Fewer than that? More than that?

Should we enforce this in all MediaWiki codebases, or should the rule be disabled by default (but available in the plugin), to be enabled via .phpcs.xml for specific files or directories thought to contain “hot” code?

Should we enforce this in all files, even unnamespaced ones, so that moving a file to a namespace won’t include a seemingly unrelated use function diff?

Event Timeline

I would love to see what the real-world impact of this is. Because enforcing that for a function (in contrast to classes where we already generally do this) would be quite surprising, inconvenient, and generally awkward, in my opinion. That should be justified.

Ori said that “With [fully-qualified array_key_exists() calls], getAll() for enwiki goes from 0.952 ms to 0.688 ms, ~40% faster!” (IRC).

Calling SiteConfiguration::getAll probably implies that array_key_exists gets called several thousand times. Which means that a single array_key_exists call is probably something like a few nanoseconds faster. That can only ever get relevant in very tight loops, can't it? Is this really worth a generic PHPCS rule when we know it won't make a difference in 99.9% of the cases?

A few points in favor of a global rule:

  • The long tail of call sites (the "99.9%") can add up to something meaningful.
  • We don't know where all the hot paths are now, and (critically) we can't predict future ones. A global rule frees us from having to rediscover this optimization.
  • It requires only a single-line use declaration in the header of files that use the function. It doesn't clutter the code body or hurt readability. It doesn't change how the code is written.
  • It would be straightforward to make this auto-fixable via phpcs.

@thiemowmde does that address your concerns? I'm not sure I understand why you think it would be inconvenient and generally awkward. (Surprising it may be, but only once :).) It seems like a very low-friction, low-cost opportunity to me.