Page MenuHomePhabricator

Hook handler functions should not require pass-by-reference unless documented in the hook signature
Open, LowPublic

Description

Hook handler callback functions should only require such parameters to be passed per reference that are documented to be replacable.

Event Timeline

@thiemowmde has been doing some cleanup in this area lately. I wonder if we have a good way of identifying hooks that are passing things by reference for legacy reasons and shouldn't be any more so we can come up with a list of extensions that need updating. (And a way to emit deprecation notices ideally).

The parent ticket T193950 sums it up very nicely: The majority of these & are legacy from old PHP versions. Unfortunately all callees (hook handlers) must be updated first, before an & can be removed from the caller (the hook).

I started cleaning up a series of extensions I care about, e.g. https://gerrit.wikimedia.org/r/433174. The longest commit message I wrote so far is this:

Don't expect objects by reference in hook handlers

The motivation for this patch is to make the code less complex, better readable, and less brittle. Example:

public function onExampleHook( Parser &$parser, array &$result ) {
    /* This is the hook handler */
}

In this example the $result array is meant to be manipulated by the hook handler. Changes should become visible to the caller. Since PHP passes arrays by value, the & is needed to make this possible.

But the & is misplaced in pretty much all cases where the parameter is an object. The only reason we still see these & in many hook handlers is historical: PHP 4 passed objects by value, which potentially caused expensive cloning. This was prevented with the &.

Since PHP 5 objects are passed by reference. However, this did not made the & entirely meaningless. Keeping the & means callees are allowed to replace passed objects with new ones. The & makes it look like a function might intentionally replace a passed object, which is unintended and actually scary in cases like the Parser. Luckily all Hooks::run I have seen so far ignore unintended out-values. So even if a hook handler tries to do something bad like replacing the Parser with a different one, this would not have an effect.

Removing the & does not remove the possibility to manipulate the object. Changes done to public properties are still visible to the caller.

Unfortunately these & cannot be removed from the callers as long as there is a single callee expecting a reference. This patch reduces the number of such problematic callees.

Identifying extensions is hard. We can easily identify some of the worst patterns by searching for &$parser, &$skin, &$sk and such: https://codesearch.wmflabs.org/things/?q=%26\%24parser. But coming up with a complete list means we need to list all affected hooks, identify extensions that use them, and check if the handler functions (that are not necessarily named after the hook) contain unwanted &. I feel this is better done via an official deprecation phase, and a release that removes all unwanted & from core in one big bang.

thiemowmde added a project: User-thiemowmde.

(And a way to emit deprecation notices ideally).

I think Reflection would be the only way for MW to determine if a hook function is expecting a deprecated reference value so it could log a deprecation warning. Make a ReflectionMethod or ReflectionFunction as appropriate,[1] call ->getParameters() to get the ReflectionParameter objects, then check ->isPassedByReference() on the ReflectionParameter at the deprecated index.[2]

[1]: Determine $callback as in Hooks::callHook(), then use new ReflectionMethod( $callback[0], $callback[1] ) if it's an array, new ReflectionMethod( $callback ) if it's a string containing '::', or new ReflectionFunction( $callback ) otherwise.
[2]: Don't forget to adjust the index being checked by the length of $hook as it is near the end of Hooks::callHook().

I started cleaning up a series of extensions I care about, e.g. https://gerrit.wikimedia.org/r/433174. The longest commit message I wrote so far is this:

That's quite a verbose message.

But coming up with a complete list means we need to list all affected hooks,

IMO that's the better place to begin. There are fewer Hooks::run calls to check.

identify extensions that use them, and check if the handler functions (that are not necessarily named after the hook)

Unless someone is doing something weird with constructing hook names with concatenation you should be able to find them all by searching for the hook name. The vast majority are probably either in extension.json or $wgHooks['HookName'].

That'll generally also give you the exact name of the handler function you need to look at.

Change 437904 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/core@master] Hooks: Support deprecating arguments passed by reference

https://gerrit.wikimedia.org/r/437904

Vvjjkkii renamed this task from Hook handler functions should not require pass-by-reference unless documented in the hook signature to sjdaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from sjdaaaaaaa to Hook handler functions should not require pass-by-reference unless documented in the hook signature.Jul 2 2018, 3:30 PM
CommunityTechBot lowered the priority of this task from High to Low.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.

I don't see how this is connected to the codebase of a specific extension, hence removing MediaWiki-extensions-Other. Feel free to clarify.

PHP 5.6 introduced the spread operator, which comes with a hack for ignoring reference requirements. (This is not in the docs but mentioned in the RfC.)

function myHook( &$x, &$y ) {}
user_call_func_array( 'myHook', [ &$x, &$y ] ); // works, passed by reference
user_call_func_array( 'myHook', [ $x, $y ] ); // raises notice, passed by value
('myHook')(...[ $x, $y ]); // works, passed by value

HHVM raises a warning about it, though, so we'd have to wait until after we drop HHVM support before using it.

Also note that

$args = [ $a, $b, $c ];
$func( ...$args );

doesn't modify $a, $b, or $c but it does modify $args, so you can't reuse the same $args array for multiple calls. Fortunately our Hooks class is already not reusing the args array for multiple calls.

Yeah, the spread does pass in the array by reference for reference parameters; it's just that the reference will be to &$args[0] and not $a (unless you define it as $args = [ &$a, $b, $c ] in which case the reference will be passed through).

We could use that principle without spread as well: https://3v4l.org/nlVpT

PHP 5.6 introduced the spread operator, which comes with a hack for ignoring reference requirements. (This is not in the docs but mentioned in the RfC.)

Oh, nice hack :)

But all of this would mean we stop passing *any* parameters by reference. We can't do that, we do need reference semantics for some hooks. We'd have to somehow signal to the run() method which parameters are *actually* references.

Also, ideally, we'd have a way to warn when a parameter that has historically been passed as a reference but is not intended to be modified was modified by the hook handler. This situation should not be ignored silently.

But all of this would mean we stop passing *any* parameters by reference. We can't do that, we do need reference semantics for some hooks. We'd have to somehow signal to the run() method which parameters are *actually* references.

It wouldn't. See the 3v4l link above - that works in HHVM and works with the current convention of calling Hooks::run, and can differentiate between reference and non-reference.

Also, ideally, we'd have a way to warn when a parameter that has historically been passed as a reference but is not intended to be modified was modified by the hook handler. This situation should not be ignored silently.

That would require passing in to Hooks::run some kind of map of which parameters allow replacement. Not hard but ugly.

Change 437904 abandoned by Legoktm:
Hooks: Support deprecating arguments passed by reference

https://gerrit.wikimedia.org/r/437904