Page MenuHomePhabricator

Using fully-qualified function calls is faster
Open, LowPublic

Description

Original task description by @Reedy in 2018:
New task description by @Lucas_Werkmeister_WMDE in Jan 2026:

What:
MediaWiki Codesniffer should enforce that files which are namespaced and use array_key_exists() include the following import near the top of the file:

use function array_key_exists;

Why:
Without this, a call to array_key_exists() within a file in e.g. namespace MediaWiki\Config is ambiguous between \MediaWiki\Config\array_key_exists(), \MediaWiki\array_key_exists() and \array_key_exists(); in theory PHP has to check each time the function is called whether the former two functions exist before calling the latter, built-in function, as the namespaced functions may be defined at any time during program execution. (I don’t know if PHP does something smart like tracking an assumption that the namespaced functions don’t exist, and invalidating that assumption when they’re defined; my guess is it doesn’t.) By declaring that we mean the unnamespaced function at the beginning of the file, we skip this lookup everywhere. (This is true for any PHP function we call from namespaced files.)

Furthermore, array_key_exists() in particular has an opcode (ZEND_ARRAY_KEY_EXISTS; merged commit 1, 2) that can be saved in the opcache instead of a generic function call instruction, further optimizing the call; this is also only possible if it’s known at compile time (syntactically) that the call must refer to the unnamespaced, built-in PHP function.

See also SiteConfiguration: Use \array_key_exists() and SiteConfiguration: Use 'use function' syntax for more discussion.

Open questions:

  • Which functions should we enforce this for? zend_try_compile_special_func_ex has a list of special-cased functions (currently 15), from strlen() to clone(); all of those? Fewer than that? More than that?
  • Should we enforce this in all MediaWiki codebases, or should the rule be disabled by default (but available in the plugin), to be enabled via .phpcs.xml for specific files or directories thought to contain “hot” code?
  • Should we enforce this in all files, even unnamespaced ones, so that moving a file to a namespace won’t include a seemingly unrelated use function diff?

Event Timeline

There's also use function, which appears to avoid this problem (based on https://wiki.php.net/rfc/use_function). Though I note that @Smalyshev voted against that RfC, maybe he has an opinion on how to handle this issue? :)

I am not big fan of namespaced functions, since if the function is part of well identified module suitable for reuse, which isn't that module a class, which we can very well namespace?

Also, while fully-qualified call of a namespaced function may be technically faster, I seriously doubt it would have a performance impact in application as large as Mediawiki, unless the said function is in a critical path and is called thousands of times and is very short and CPU-bound. Do we have any performance data suggesting this is a problem worth considering? Usually micro-optimizations like that aren't a very good idea, they do not improve overall roundtrip time, hard to maintain and ultimately lead to worse code.

Also, I must note the second article is looking at debug opcodes and making conclusions from it. This is wrong, those aren't opcodes the real engine runs. I am not sure what exactly would be the opcodes after optimizer is done, but pretty sure not that is displayed there (for one, NOP's and EXT's will be gone).

In general, I personally would advise against any micro-optimizations unless they are proven - on real Mediawiki code - to produce real-life speedups. My experience with optimizing large-app code suggests usually micro-optimizing is a waste of time, and trying to outsmart the compiler backfires in the next version.

In general, I personally would advise against any micro-optimizations unless they are proven - on real Mediawiki code - to produce real-life speedups. My experience with optimizing large-app code suggests usually micro-optimizing is a waste of time, and trying to outsmart the compiler backfires in the next version.

Is it really trying to outsmart the compiler is all we're doing is making it explicit it's in the global namespace? Surely it's helping it...

Considering the change is basically s/myFunctionCall/\myFunctionCall/ and then enforcing it with phpcs or similar, I don't really think that's a maintenance burden

I do agree it might be worth doing some further benchmarking/similar to see if it does make a worthwhile difference though

aaron triaged this task as Low priority.Jul 30 2018, 8:09 PM

Without some evidence that it would help with latencies, this is difficult to act on. E.g. a local patch and run with the Parser benchmark to see if it makes a meaningful difference.

In any event, moving to the Radar as perf team isn't currently interested in working on this. However, if a benchmark shows that it is worthwhile performance-wise, it might make sense to codify, enforce and auto-fix via PHPCS. Anything short of that would likely not be worth the continuous human effort, however.

Ori said that “With [fully-qualified array_key_exists() calls], getAll() for enwiki goes from 0.952 ms to 0.688 ms, ~40% faster!” (IRC).

Calling SiteConfiguration::getAll probably implies that array_key_exists gets called several thousand times. Which means that a single array_key_exists call is probably something like a few nanoseconds faster. That can only ever get relevant in very tight loops, can't it? Is this really worth a generic PHPCS rule when we know it won't make a difference in 99.9% of the cases?

A few points in favor of a global rule:

  • The long tail of call sites (the "99.9%") can add up to something meaningful.
  • We don't know where all the hot paths are now, and (critically) we can't predict future ones. A global rule frees us from having to rediscover this optimization.
  • It requires only a single-line use declaration in the header of files that use the function. It doesn't clutter the code body or hurt readability. It doesn't change how the code is written.
  • It would be straightforward to make this auto-fixable via phpcs.

Prior art: https://cs.symfony.com/doc/rules/function_notation/native_function_invocation.html

@thiemowmde does that address your concerns? I'm not sure I understand why you think it would be inconvenient and generally awkward. (Surprising it may be, but only once :).) It seems like a very low-friction, low-cost opportunity to me.

I think an auto-fix should be a requirement if we want to make this a rule.

My only counter argument is that I value the ability for contributors to eventually be able to write (most) patches correctly in terms of coding style without needing to go through a mandatory cycle of (submit to Gerrit, wait for CI, get lint failure, run composer fix, re-submit) which can take upto 20 minutes every time. And that's before we get to more useful CI feedback, like an integration test with a MW extension, a PHP version, or a DB type, that I wouldn't discover locally; which is feedback you can't get until after the second CI submission.

Today, the medium-high standard I hope people try to meet before submitting to Gerrit is that code "works" in their browser, and something like composer phpunit -- tests/phpunit/includes/MyCurrentArea passes locally on their PHP version. I don't think running composer phpcs locally should become a requirement in local iteration in order to feel that a patch is likely to pass coding style requirements. But... maybe we can get used to this rule and remember it when writing code day-to-day?

There are two opposing forces in how the rule can be configured:

  • Narrow configuration: Limit enforcement to compiler-optimised functions. This has the greatest performance gains for the minimum disruption. Except, it feels arbitrary for most contributors because it is hard for humans to remember, and will also change with time.
  • Wide configuration: Simply enforce it for all built-in global functions. This is easier to remember, but also means you have to deal with it more often.

The long tail of call sites (the "99.9%") can add up to something meaningful.

My point was that it can't. While one array_key_exists that is executed in a complex loop can make a difference, a thousand array_key_exists that are only executed once can't make any meaningful difference. Especially not when the context is code that does way more complex things that take way more time. Optimizing a single SQL query probably saves more time and CO₂ than such a micro-optimization.

A much better solution would be to disable the feature in the PHP compiler that looks for known function names in unrelated namespaces. I don't know if this is possible. But I'm quite certain there is no code that makes use of this (i.e. declaring their own array_key_exists in a namespace) anywhere in our ecosystem.

[…] why you think it would be inconvenient […]

Because of what @Krinkle explained above. Most people will only discover this after they got an error from the CI.

[…] A much better solution would be to disable the feature in the PHP compiler that looks for known function names in unrelated namespaces. […]

It doesn't look in an unrelated namspace. It looks specifically in the current namespace, and then falls back to the global scope. Are you suggesting PHP generally removes the ability to define functions under a namespace?

The other way to approach it would be to remove the fallback to global scope, and thus require that function references are unambigious, the same way class references are already today. (In a namespace Foo, calling Bar::quux does not try Foo\Bar first and then global Bar. It only tries Foo\Bar and fails if that doesn't exist.)

That was proposed at https://wiki.php.net/rfc/fallback-to-root-scope-deprecation, along with this 2018 discussion on php-internals mailing list. This would remove runtime fallback to the global scope, thus making a header like use array_key_exists; mandatory to call a global function, the same way that a use header is already mandatory when calling global classes from namespaced code.

While the added headers would be annoying, it solves the issue I raised because it would be consitent and thus discoverable during development. It's not an issue today that global class references require use headers, either.

I wonder if upstream might be open to making this an opt-in setting (php.ini directive), or a file-level directive like declare(strict_types=1);, if changing the default would not gain traction.

Even with a directive like that, I'm not sure that we can make it "just work" without use-headers. Suppose the following:

declare(strict_performance=1);

namespace MediaWiki\Config;

classs SiteConfiguration {
  public function get() {
    return array_key_exists();
  }
}

It remains ambiguous from the interpreter perspective. Perhaps a JIT optimization is possible. But, such an optimization would need to subscribe to the autoloader and invalidate/regenerate the JIT output after a new file is compiled if it contains the same namespace, and if it defines one or more functions. I don't know if there is precedent in the PHP JIT today for invalidating/regenerating unchanged source code at runtime in response to a "distant" action. In theory it seems possible. In which case we don't need "use"-headers, don't need a new strict mode, and don't need to remove support for global fallbacks.

  • Should we enforce this in all files, even unnamespaced ones, so that moving a file to a namespace won’t include a seemingly unrelated use function diff?

No. Namespacing a file likely includes seemingly unrelated changes to use class statements as well (other classes in the global namespace, including SPL classes, need to be added, classes in the target namespace can be removed), functions are no different. Functions from the same namespace the code is in shouldn’t be used, since that’s the first choice anyway – if the code is in the global namespace, functions from the global namespace shouldn’t be used. If this becomes the norm, it wouldn’t be that surprising anyway. (I don’t want to comment on whether it should become the norm, though. I wouldn’t mind it, but neither am I completely convinced it’s necessary.)

Are you suggesting PHP generally removes the ability to define functions under a namespace?

Only for the (very few) known function names we are talking about. This would be similar to making them reserved keywords, I guess.

The use of static closures (T274038: Enforce use of static closures) in mediawiki codestyle could be an suprise as well via CI for developer not familiar with mediawiki code (used in most deployed extensions as optimisation).

But many bigger php frameworks seems to use the use clause or \ to force use of global functions everywhere (also for non-optimised function)

How this could look for the rdbms code: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1237346

Current list (as of php8.5):

use function array_key_exists;
use function array_map;
use function array_slice;
use function boolval;
use function call_user_func;
use function call_user_func_array;
use function chr;
use function count;
use function defined;
use function doubleval;
use function floatval;
use function func_get_args;
use function func_num_args;
use function get_called_class;
use function get_class;
use function gettype;
use function in_array;
use function intval;
use function is_array;
use function is_bool;
use function is_double;
use function is_float;
use function is_int;
use function is_integer;
use function is_long;
use function is_null;
use function is_object;
use function is_resource;
use function is_scalar;
use function is_string;
use function ord;
use function printf;
use function sizeof;
use function sprintf;
use function strlen;
use function strval;
use function array_key_exists;
use function array_map;
use function array_slice;
use function boolval;
use function call_user_func;
use function call_user_func_array;
use function chr;
use function count;
use function defined;
use function doubleval;
use function floatval;
use function func_get_args;
use function func_num_args;
use function get_called_class;
use function get_class;
use function gettype;
use function in_array;
use function intval;
use function is_array;
use function is_bool;
use function is_double;
use function is_float;
use function is_int;
use function is_integer;
use function is_long;
use function is_null;
use function is_object;
use function is_resource;
use function is_scalar;
use function is_string;
use function ord;
use function printf;
use function sizeof;
use function sprintf;
use function strlen;
use function strval;

Some function have alternative syntax in php to reduce function call:

  • (cast) as alternative for boolval, floatval, intval, strval
  • call_user_func, call_user_func_array, func_get_args and func_num_args can use php syntax (see T385518)
  • get_class could be written as self::class, get_called_class as static::class
  • is_null can be written as comparision to null (already enforced)
  • Some are already forbidden by a sniff (sizeof, doubleval, is_resource, is_integer, is_double, is_long)
  • Not always and non-trivial, but array_map could be written as loop
  • Not always and non-trivial, but count could be sometimes omitted (falsy conditions) or replaced with comparision to empty array
  • Not always and non-trivial, but count could be sometimes omitted (falsy conditions) or replaced with comparision to empty array

As long as the parameter is an array, not a \Countable object.


Optimizable calls to printf() and sprintf() can also be replaced by syntax – as long as the format string is constant and only contains plain %ss and %ds (and %%s), string interpolation (combined with echo in case of printf()) can be used instead. If the format string contains anything else, it wouldn’t be optimized anyway. Such simple usage of sprintf() turns out to be more common in MediaWiki core than I would’ve expected. (printf() is much more rarely used – since printing to standard output makes sense only in tests and maintenance scripts –, so simple cases are also rarer. I found three simple calls: two in maintenance/findDeprecated.php and one in tests/parser/fuzzTest.php – the latter could actually use var_dump(), couldn’t it?)

The use of static closures (T274038: Enforce use of static closures) in mediawiki codestyle could be an suprise as well […]

I agree, and I would similarly not recommend that rule. In terms of impact I do believe that the closure rule is less likely to impact contributors, as we have about an order of magnitude more source files that call a built-in function, than have a closure. Most files don't use closures.