Page MenuHomePhabricator

Hook container with strong types and DI
Open, MediumPublic

Description

Problem statement

This proposal aims to solve the following problems with the existing hook system:

  • There is no way to inject dependencies into a hook handler, so many handlers are obliged to access the global service container.
  • The types of the hook arguments are undeclared and unvalidated.
  • There is no place to put doc comments relating to hook functions; they are documented in a separate text file and on mediawiki.org. There is no editor integration with the separate documentation.
  • The global hook runner is implemented in static functions rather than a service.

Proposal summary

  • Hook interfaces: Have one interface per hook. Each interface has a single method with a name specific to the hook.
  • Indirect registration: Instead of having extensions register the function name associated with a hook, they register a handler name. Extensions also register an array of handlers, each handler being described by an ObjectFactory spec.
  • Handler dependency injection: As with the REST API, the "services" key in the ObjectFactory spec can be used to inject services into the hook handler constructor.
  • HookRunner: A HookRunner class in core provides a concrete implementation of all core hook interfaces. That is, for each hook, the same interface is used for both calling the hook and handling it. Each method consists of approximately one line of code which proxies the call to HookContainer. Extensions may implement their own HookRunner style classes along the same lines.
  • HookContainer: A service which is responsible for creating, caching and calling handler object instances.

Comparison with past work

  • T212482 Filters and Actions: The current proposal is complementary to the filters and actions RFC. The current proposal allows existing hooks to be remediated, as well as providing a precedent for registration and strong typing that can be used when filters and actions are implemented.
  • T154674 HookRunner: That proposal is conservative and leaves many of the problems with hooks unresolved. I agree with @Tgr's comment at T154674#2998357, and this proposal is very similar to the scheme he proposed there.

Tradeoffs and judgement calls

Having one interface per hook, with a unique method name for each hook, allows extensions to implement many different hooks in a single class, simplifying DI and allowing them to keep the same code structure as is currently typical. In smart code editors, adding an interface to a handler class automatically gives the developer the documentation and a method stub. The system makes it easy for extension developers to implement hooks.

One drawback of using interfaces is that it will no longer be possible to add parameters to existing hook interfaces. Instead, hook interfaces will need to be versioned.

Inevitably, some classes will have a very long "implements" clause. I think the clutter is a small price to pay. PHP might iterate over all parent interfaces in certain circumstances — we should verify that the performance is acceptable.

In T154674#2998357, @Tgr proposed that instead of interfaces, there would be one abstract class per module, for example, UploadHookRunner. This may be convenient for core callers of hooks, but it is not convenient to reuse such an abstract class to ensure strong typing in handlers. The point of using interfaces is to allow sharing of parameter signatures between callers and handlers.

On what level of modularity should HookRunner and hook interfaces be grouped? I am proposing one HookRunner class in core. I previously proposed grouping of all core interfaces under a single namespace, but the proposal has now been updated to place interfaces in the same namespace as the caller. I think it is convenient to keep masses of repetitive boilerplate code in one place. But it is not an essential point.

It's important that method names are globally unique. I suggest naming them after the hook name. If methods are not globally unique then the conflicting hooks cannot be implemented in the same handler class.

Registration

The registration system follows the way we did handler registration in the REST API, except that an extra level of indirection is added in the config, to avoid duplication of ObjectFactory specs. I think the "handler name" could be implicitly prefixed with the extension name, so that handler names would not need to be globally unique.

This sample code shows a very brief potential syntax in which handler names are brought into the same namespace as global functions. The registration system would search first for a hook handler with the corresponding name, and if one does not exist, it is assumed to be a global function. Alternatively, the namespaces could be separated by either changing the top-level "Hooks" key or by making the value be an object like {"handler":"*"}.

{
	"HookHandlers": {
		"*": {
			"class": "MyExtension\\Hooks",
			"services": [ "ActorMigration" ]
		}
	},
	"Hooks": {
		"UserLoginComplete" : "*"
	}
}

Deprecation and superseded hooks

Core will have a list of deprecated hooks, which extensions can add to via the DeprecatedHooks attribute. If an extension registers a handler for a deprecated hook, a deprecation warning will be raised when extension.json is loaded, unless the deprecation is acknowledged in extension.json:

{
	"Hooks": {
		"UserLoginComplete" : {
			"handler": "*",
			"deprecated": true
		},
		"UserLoginComplete2" : {
			"handler": "*"
		}
	}
}

Acknowledging deprecation in this way causes the handler to not be called. In this example, UserLoginComplete2 replaces UserLoginComplete, and calls to UserLoginComplete will be filtered if core is aware that UserLoginComplete is a deprecated hook.

For forwards compatibility, if a handler acknowledges deprecation but core is not aware of the hook in question being deprecated, the call will not be filtered. So in this example, the UserLoginComplete handler will still be called as long as a version of core is used in which it is not deprecated.

Sample handler code

<?php

namespace MyExtension;

use ActorMigration;
use UserLoginCompleteHook;

class Hooks implements UserLoginCompleteHook {
	private $actorMigration;
	
	public function __construct( ActorMigration $actorMigration ) {
		$this->actorMigration = $actorMigration;
	}
	
	/**
	 * @param \User $user
	 * @param string $injectedHtml
	 * @param bool $direct
	 * @return bool
	 */
	function onUserLoginComplete( \User $user, &$injectedHtml, $direct ) {
		// ...
	}
}

Sample core caller

Here is some incomplete sample code showing how legacy reference arguments could be dealt with during migration to the new system. UserLoginComplete is an existing hook which passes $user as a reference, for legacy reasons.

UserLoginComplete.php

<?php

interface UserLoginCompleteHook {
	/**
	 * @param \User $user
	 * @param string $injectedHtml
	 * @param bool $direct
	 * @return bool
	 */
	function onUserLoginComplete( \User $user, &$injectedHtml, $direct );
}

HookRunner.php

<?php

namespace MediaWiki\HookRunner;

use UserLoginCompleteHook;

class HookRunner implements
	// ... many interfaces would be added here ...
	UserLoginCompleteHook,
{
	/** @var HookContainer */
	private $hookContainer;

	public function __construct( HookContainer $hookContainer ) {
		$this->hookContainer = $hookContainer;
	}

	/**
	 * @param \User $user
	 * @param string $injectedHtml
	 * @param bool $direct
	 * @return bool
	 */
	function onUserLoginComplete( \User $user, &$injectedHtml, $direct ) {
		return $this->hookContainer->run( 'UserLoginComplete',
			[ $user, &$injectedHtml, $direct ],
			[ 'legacyRefArgs' => [ 0 ] ]
		);
	}
}

HookContainer.php

For simplicity, I have left out a lot of details present in Hooks::run(). The point is to sketch out the call flow.

<?php
namespace MediaWiki\HookRunner;

class HookContainer {
	// ...

	public function run( $name, $args, $options = [] ) {
		$legacyHandlers = $this->getLegacyHandlers( $name );
		if ( $legacyHandlers ) {
			$legacyArgs = $args;
			if ( isset( $options['legacyRefArgs'] ) ) {
				foreach ( $args as $i => $arg ) {
					if ( in_array( $i, $options['legacyRefArgs'] ) ) {
						$legacyArgs[$i] =& $args[$i];
					}
				}
			}
			foreach ( $legacyHandlers as $handler ) {
				$success = $handler( ...$legacyArgs );
				if ( !$success ) {
					return false;
				}
			}
		}

		$handlers = $this->getHandlers( $name );
		if ( $handlers ) {
			$funcName = 'on' . str_replace( ':', '_', $name );
			foreach ( $handlers as $handler ) {
				 $success = $handler->$funcName( ...$args );
				 if ( !$success ) {
					return false;
				 }
			}
		}

		return true;
	}
}

SpecialCreateAccount.php

@@ -44,6 +48,8 @@ class SpecialCreateAccount extends LoginSignupSpecialPage {
 
 	public function __construct() {
 		parent::__construct( 'CreateAccount' );
+		$this->hookRunner = new HookRunner(
+			MediaWikiServices::getInstance()->getHookContainer() );
 	}

@@ -139,14 +145,14 @@ class SpecialCreateAccount extends LoginSignupSpecialPage {
 		# Run any hooks; display injected HTML
 		$injected_html = '';
 		$welcome_creation_msg = 'welcomecreation-msg';
-		Hooks::run( 'UserLoginComplete', [ &$user, &$injected_html, $direct ] );
+		$this->hookRunner->onUserLoginComplete( $user, $injected_html, $direct );

Details

Related Gerrit Patches:
mediawiki/core : masterHook Container
mediawiki/core : masterAutomatically generated HookRunner
mediawiki/core : masterAutomatically generated hook interfaces
mediawiki/core : masterUse array literals when calling Hooks::run()

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 10 2019, 5:07 AM
tstarling updated the task description. (Show Details)Dec 10 2019, 5:23 AM
tstarling updated the task description. (Show Details)
TK-999 added a subscriber: TK-999.Dec 10 2019, 2:55 PM

I note that for cases where ExtensionFoo wants to add a listener for a hook from ExtensionBar but also wants to maintain the ability to be used when ExtensionBar is not installed, it will need to register a separate listener just for ExtensionBar's hooks because the requisite interfaces won't be available when ExtensionBar is not available. That seems easily possible with the design described here, but may warrant documentation as it may not be immediately obvious.

Extensions may implement their own HookRunner style classes along the same lines.
[...]
On what level of modularity should HookRunner and listener interfaces be grouped? I am proposing one HookRunner class in core, and grouping of all core interfaces under a single namespace. Having a single HookRunner in core means that only one service needs to be added to MediaWikiServices, instead of a collection of services or a factory which delivers a module-specific HookRunner.

This probably means that MediaWikiServices will wind up being cluttered with global services like "ExtensionFooHookRunner", "ExtensionBarHookRunner", "ExtensionBazHookRunner", and so on.

Is the HookRunner meant to actually have any persistent state that justifies it being an actual service, or anything to override for testing that couldn't as well be satisfied by an overridden HookContainer instead? If not, instead of a service it could be more of an internal utility class wrapping HookContainer, intended for uses by composition.

Some extension class that calls a hook
class MyExt {
    private $hookRunner;

    public function __construct( HookContainer $hookContainer ) {
        $this->hookRunner = new MyExtHookRunner( $hookContainer );
    }

    public function doThings() {
        /* ... */
        $this->hookRunner->onMyExtDoThings( $things );
        /* ... */
    }
}

I note that for cases where ExtensionFoo wants to add a listener for a hook from ExtensionBar but also wants to maintain the ability to be used when ExtensionBar is not installed, it will need to register a separate listener just for ExtensionBar's hooks because the requisite interfaces won't be available when ExtensionBar is not available.

Yes, true.

If not, instead of a service it could be more of an internal utility class wrapping HookContainer, intended for uses by composition.

Good idea. It has no state apart from the HookContainer. I'll just edit the task description to call it a class rather than a service for now.

tstarling updated the task description. (Show Details)Dec 10 2019, 5:29 PM

That change weakened the rationale for having a centralized HookRunner class for all core hooks, do you think I should reconsider that?

I don't feel strongly either way on that. If we often had code calling hooks from different subsystems there would be advantage to not requiring that code to have multiple HookRunner-type objects, but that doesn't seem likely to actually be the case. Usually code only has one hook, or a few that are related enough they'd probably be in the same runner anyway.

From a quick glance,

Good:

  • I like the minimal and light approach to the problem.
  • I note that this also gets us most (if not, all) the way toward solving the (otherwise orthogonal) "hooks documentation" problem.

Unsure:

  • I'd like to start embracing modularity more within core, perhaps starting by encouraging that these hook interfaces are namespaced within specific sub components and not generic to "core" (thus stepping away from how we do API modules and Special pages in core). Long-term direction being that core namespaces would be more akin to extension namespaces in that MediaWiki\Watchlist would not only contain the value classes and service classes, but also the API/Special subclasses. Perhaps the hook interfaces would also be better off starting out in that category of things, and not in the "legacy" style of all in one directory where there is no clear ownership. This closely relates to the finishing of T166010 and modularization in general, but I think we can find a way to not make it dependent on that. E.g. we can say that it should be within namespaces where possible, and that as fallback (but not encouraged/promoted/examplifed) it would go in a generic namespace.
  • The idea of sharing a single hook container in extension with multiple hook callbacks seems fine at first, but it's not clear to me why we want to encourage this. In particular, I think this has worked out somewhat negatively in the Wikibase-family of extensions. Their hook singletons are too monolithic and closely coupled. It makes the code hard to maintain, and (more to my own interest) creates a performance nightmare in that to invoke any hook, you end up constructing an expensive tree of services for all dependencies of all hooks. See T177311 and T160678 for examples of this.
    • Can you think of examples where there would be an explicit gain in maintainability and (de)coupling by sharing a container instance with two (related) hooks? Intuitively, I think this goes against some of what we wrote at T212482 about how closely related hooks are likely an anti-pattern (think of the pre-AuthManager hooks).

Long-term direction being that core namespaces would be more akin to extension namespaces in that MediaWiki\Watchlist would not only contain the value classes and service classes, but also the API/Special subclasses.

Too bad we have to pick one. I find it convenient in maintaining the Action API that all the relevant classes in core are in one place rather than scattered about the codebase, but obviously that's less convenient for someone who wants to maintain "the watchlist" including the relevant API modules.

  • The idea of sharing a single hook container in extension with multiple hook callbacks seems fine at first, but it's not clear to me why we want to encourage this.

Did you mean "hook listener" in that sentence? In this proposal "HookContainer" is the equivalent of the Hooks class, that manages registration of listeners and dispatching calls to them.

Intuitively, I think this goes against some of what we wrote at T212482 about how closely related hooks are likely an anti-pattern (think of the pre-AuthManager hooks).

I saw the opposite there, actually. I tried to call that out in T212482#4869654; your reply in T212482#5248239 doesn't seem to have really addressed it.

  • The idea of sharing a single hook container in extension with multiple hook callbacks seems fine at first, but it's not clear to me why we want to encourage this. In particular, I think this has worked out somewhat negatively in the Wikibase-family of extensions. Their hook singletons are too monolithic and closely coupled. It makes the code hard to maintain, and (more to my own interest) creates a performance nightmare in that to invoke any hook, you end up constructing an expensive tree of services for all dependencies of all hooks. See T177311 and T160678 for examples of this.

This system provides the option of splitting up listener classes as finely as you like. If you're writing an extension and want one class per callback, then fine, do it that way. I'm tired of the architectural improvements in MediaWiki that take an all-or-nothing approach, linking virtues, so that code can either be perfect or terrible. Faced with that choice, many developers will choose to stick with terrible code. The goal of this proposal is to remediate the existing hook system as far as possible, providing an easy migration path for existing code.

I would be fine with discouraging the use of kitchen sink hook listener classes in documentation and by our own virtuous examples. I would not want to discourage it by making it impossible by technical means.

It's true that the performance implications would be concerning if the plan was to migrate all extension hook classes to use DI without splitting them up. I think existing massive hook classes in extensions could initially continue to use MediaWikiServices::getInstance() but switch from static functions to non-static functions with interfaces. That way, they get documentation and strong typing, they just don't get DI for now.

daniel added a project: User-Daniel.
apaskulin added a subscriber: apaskulin.
Base added a subscriber: Base.Dec 15 2019, 1:41 AM
tstarling updated the task description. (Show Details)Dec 16 2019, 12:25 AM
Osnard added a subscriber: Osnard.Dec 16 2019, 7:09 AM
MaxSem added a subscriber: MaxSem.Dec 16 2019, 10:48 AM

One thing to discuss here is the guidelines of how to evolve code using this approach: previously, you could add a new parameter to a hook and it will just work™️ with existing hook handlers. Even some parameter type changes were possible. Not the case in this RFC: existing hook handlers can't be changed easily.

Should we do hook versioning e.g

$this->hookRunner->onUserLoginComplete( $user, $injected_html, $direct );
$this->hookRunner->onUserLoginComplete2( $user, $injected_html, $direct, $newParameter );
. . .
$this->hookRunner->onUserLoginComplete15( $userLoginCompleteArgs3 );

Or should we require extensions to modify their code for every new MW version?

How extensions are supposed to support multiple MW versions?

I really like the proposal, especially using the same interface for handling and for triggering events.

One thing I don't quite see is why we still need hooks names, other than the names of the associated interfaces. This seems redundant and prone to inconsistencies. For legacy hook, there could be an aliasing mechanism by which extra names for a hook can be registered, which then get mapped to the name of an interface.

To mitigate inconvenience when a new version of a hook is introduced, we could support a mechanism to register adapters: When MyNewHook is introduced to replace MyOldHook, an adapter MyOldHookAdapter can written that implements MyNewHook and wraps an instance of MyOldHook. An instantiator callback (or object spec) for that adapter can then be registered with something like addAdapter( MyOldHook::class, MyNewHook::class, $instantiatorCallback ). When an eon registers a handler for MyOldHook, the instanstiator callback is used to create an instance of MyOldHookAdapter which gets registered as a MyNewHook instead of the original handler.

I *really* like the approach with using the interfaces and a service, like the proposed "HookContainer", to run the hooks.

I'm wondering if the interfaces could be leveraged more for executing the hooks, for example:

// hook container injected from MediaWikiServices
hookContainer->getHook(UserLoginComplete::class)
 ->onUserLoginComplete( $user, $injected_html, $direct );
}

The result of getHook implements the hook's interface (in this case UserLoginComplete), so we get type safety. The implementation returned by the hookContainer would run all hooks that have been registered for this interface.

Anomie added a comment.EditedDec 16 2019, 3:27 PM

One thing I don't quite see is why we still need hooks names, other than the names of the associated interfaces. This seems redundant and prone to inconsistencies.

The convention here seems to be that the interface's "base" name always matches the hook name. Using just the "base" name avoids repeating namespace prefixes unnecessarily in places like extension.json, and avoids potential collisions between same-named hooks existing in different namespaces.

As some examples of the latter if we didn't use hook names:

  • If MediaWiki\Foo\SomeName is an alias for MediaWiki\Bar\SomeName, we'd need extra logic to ensure that listeners registered for both names wind up being called. But at the same time not doing so for MediaWiki\Foo\SomeOtherName and MediaWiki\Bar\SomeOtherName which are actually different hooks that happen to have the same base name.
  • If MediaWiki\Foo\SomeName and MediaWiki\Bar\SomeName are different hooks, they can't be implemented by the same listener or runner because they'd both be defining "onSomeName" methods. Or else all the method names on all hook interfaces would be longer because they'd have to somehow incorporate the full namespace.

To mitigate inconvenience when a new version of a hook is introduced, we could support a mechanism to register adapters: When MyNewHook is introduced to replace MyOldHook, an adapter MyOldHookAdapter can written that implements MyNewHook and wraps an instance of MyOldHook. An instantiator callback (or object spec) for that adapter can then be registered with something like addAdapter( MyOldHook::class, MyNewHook::class, $instantiatorCallback ). When an eon registers a handler for MyOldHook, the instanstiator callback is used to create an instance of MyOldHookAdapter which gets registered as a MyNewHook instead of the original handler.

It seems to me that it would be simpler to just have the calling code call both ->onMyNewHook() and ->onMyOldHook(), with the latter following the usual deprecation process.

Extensions that follow the "release branches" policy would just update interface usage, hook signature, and extension.json all at once to drop use of the old versions and begin using the new.

Extensions using the "master" policy would probably[1] have to register multiple entries in HookListeners to deal with the relevant interfaces potentially not existing, and would probably have their onMyOldHook implementation look something like

public function onMyOldHook() {
    if ( interface_exists( MyNewHook::class ) ) {
        return; // new hook interface exists, assume it was/will be called
    }

    // hook logic goes here
}

[1]: Or not, if it's not actually checked that the listener actually implements the interface versus just providing the correctly-named method with a compatible signature. And if we don't start having the methods be like "MyHook_v1::onMyHook()" and "MyHook_v2::onMyHook()".

I'm wondering if the interfaces could be leveraged more for executing the hooks, for example:

// hook container injected from MediaWikiServices
hookContainer->getHook(UserLoginComplete::class)
 ->onUserLoginComplete( $user, $injected_html, $direct );
}

The result of getHook implements the hook's interface (in this case UserLoginComplete), so we get type safety. The implementation returned by the hookContainer would run all hooks that have been registered for this interface.

What's the difference between that and the HookRunner class(es) as proposed?

// hook container injected from MediaWikiServices
hookContainer->getHook(UserLoginComplete::class)
->onUserLoginComplete( $user, $injected_html, $direct );
}

Yes, except that the class that calls the hook shouldn't have the HookContainer injected -- it should have handlers for the individual hooks injected.

The wiring file would have:

function ( MediaWikiServices $services ) {
    return new MyService(
        $services->getHookContainer()->getHook(OnFooHandler::class),
        $services->getHookContainer()->getHook(FooBarFilter::class)
    );
}

And the, in the service class:

...
 $this->onFooHandler->onFoo( $something );
...
 $this->fooBarFilter->filterFooBar( $foo, $bar );
daniel added a comment.EditedDec 16 2019, 5:48 PM

The convention here seems to be that the interface's "base" name always matches the hook name. Using just the "base" name avoids repeating namespace prefixes unnecessarily in places like extension.json, and avoids potential collisions between same-named hooks existing in different namespaces.

Hooks with the same name won't collide when they are namespaced. With "plain" hook names, prefixes have to be used by extensions to avoid that. Why not just namespaces as prefixes, then?

As some examples of the latter if we didn't use hook names:

  • If MediaWiki\Foo\SomeName is an alias for MediaWiki\Bar\SomeName, we'd need extra logic to ensure that listeners registered for both names wind up being called.

Aliases would be resolved when hook handlers get registered. That's indeed the only time aliases would be resolved.

But at the same time not doing so for MediaWiki\Foo\SomeOtherName and MediaWiki\Bar\SomeOtherName which are actually different hooks that happen to have the same base name.

I'd only resolve aliases explicitly defined as such. Not just cut away the prefix.

  • If MediaWiki\Foo\SomeName and MediaWiki\Bar\SomeName are different hooks, they can't be implemented by the same listener or runner because they'd both be defining "onSomeName" methods. Or else all the method names on all hook interfaces would be longer because they'd have to somehow incorporate the full namespace.

I agree that this is a potential problem, but I don't see how using "plain names" makes it any better. It just makes all names longer, the hook name and the method name.

It seems to me that it would be simpler to just have the calling code call both ->onMyNewHook() and ->onMyOldHook(), with the latter following the usual deprecation process.

That's indeed the simplest solution, but it has a downside that is already problematic in core code now: if the old signature requires an object that is not needed by the new hook, and would have to be constructed just to satisfy the legacy hook's signature, it's nicer to encapsulate that logic in an adapter class. Allows for automated deprecation warnings, and reduces boiler plate code in the application logic. This also ensures that the code is only run when actually needed - though it might cause such code to run once per handler, instead once per call...

Extensions using the "master" policy would probably[1] have to register multiple entries in HookListeners to deal with the relevant interfaces potentially not existing, and would probably have their onMyOldHook implementation look

This actually raises a serious issue: making extensions forward-compatible. An extension cannot easily make use of the new interface unless it *requires* that new interface to be present. We'd have to make hook registration depend on what is available, perhaps by encoding a conditional in the object spec.

As some examples of the latter if we didn't use hook names:

  • If MediaWiki\Foo\SomeName is an alias for MediaWiki\Bar\SomeName, we'd need extra logic to ensure that listeners registered for both names wind up being called.

Aliases would be resolved when hook handlers get registered. That's indeed the only time aliases would be resolved.

Doing this as stated would require loading the PHP file for every hook registered, even if it's not going to be called. We generally try to avoid reading PHP files that aren't actually going to be used during the execution.

In some cases that might result in an undesired exception, if the hook isn't going to be called because it doesn't actually exist in the current version of MediaWiki (i.e. it's being registered for back-compat or forward-compat).

I agree that this is a potential problem, but I don't see how using "plain names" makes it any better. It just makes all names longer, the hook name and the method name.

Better "onFooSomeName" than "onMediaWiki$Foo$SomeName" (or whatever mangling we choose for the backslashes).

That's indeed the simplest solution, but it has a downside that is already problematic in core code now: if the old signature requires an object that is not needed by the new hook, and would have to be constructed just to satisfy the legacy hook's signature, it's nicer to encapsulate that logic in an adapter class.

It seems to me that creating a whole adapter class and having to register it and so on is far more complicated than just doing

// OldHook needs an OldObj
$oldObj = new OldObj( ... );
$this->hookRunner->onOldHook( $oldObj );

where it's needed.

If the equivalent of "new OldObj" is actually expensive enough to care, wrap the whole thing in if ( $this->hookContainer->isRegistered( 'OldHook' ) ) {.

Allows for automated deprecation warnings,

Adapters aren't needed for this. The HookRunner's implementation of onSomeOldThing() could set $options['deprecatedVersion'] or the like regardless of any adapter.

and reduces boiler plate code in the application logic.

Considering that most hooks have exactly one caller, this too seem unlikely to be much of a concern.

This also ensures that the code is only run when actually needed - though it might cause such code to run once per handler, instead once per call...

In the best case you're running a bunch of extra code to register the adapter just before the hook is called to try to avoid running a probably-similar amount of code for just creating the object. In the worst case you're running a bunch of code during setup to register all these adapters even if they're only rarely needed.

Extensions using the "master" policy would probably[1] have to register multiple entries in HookListeners to deal with the relevant interfaces potentially not existing, and would probably have their onMyOldHook implementation look

This actually raises a serious issue: making extensions forward-compatible. An extension cannot easily make use of the new interface unless it *requires* that new interface to be present. We'd have to make hook registration depend on what is available, perhaps by encoding a conditional in the object spec.

As currently proposed, this isn't much of a problem. The listener class isn't loaded until a hook on it is actually called, so it doesn't matter if it would use an interface that doesn't exist when the hook is never called. So, as I said, the extension would merely have to split up its listeners across more than one class. Which is already proposed as a suggestion anyway for other reasons, see T240307#5729964 and T240307#5733085.

public function onMyOldHook() {
    if ( interface_exists( MyNewHook::class ) ) {
        return; // new hook interface exists, assume it was/will be called
    }
    // hook logic goes here
}

This seems awkward, and we do have a lot of extensions using the "master" compatibility policy. Let's move this logic to HookContainer.

Let's have a concept of hook versions in core, and if an extension registers listeners for multiple versions of a hook, only the latest known one is called. HookContainer::getLegacyHandlers() and HookContainer::getListeners() can check for later versions of a hook within the same extension and filter out deprecated listeners.

I considered just declaring the fallback sequence in extension.json, omitting the core registry of hook names. The problem with this is that HookContainer won't know what the latest version is. If MyHook1, MyHook2 and MyHook3 are registered in an extension, and the latest version in core is MyHook2, then the call to MyHook2 needs to be unfiltered. So it doesn't help if the extension is telling us that MyHook3 is the latest.

We could still have a "filterable" flag in extension.json enabling this filtering behaviour, allowing extensions to opt in and making it more obvious what is going on by reading the extension source code. We just can't omit the core hook version list altogether, since it's needed for forwards compatibility of core.

daniel added a comment.EditedDec 17 2019, 10:46 AM

Aliases would be resolved when hook handlers get registered. That's indeed the only time aliases would be resolved.

Doing this as stated would require loading the PHP file for every hook registered, even if it's not going to be called. We generally try to avoid reading PHP files that aren't actually going to be used during the execution.

The aliases would be a string to string map in HookContainer, with values for core hardcoded. I don't see how that would cause any classes to be loaded.

Better "onFooSomeName" than "onMediaWiki$Foo$SomeName" (or whatever mangling we choose for the backslashes).

If you already know onFooSomeName is unique, then how would the use of namespaces make the method name any longer?

It seems to me that creating a whole adapter class and having to register it and so on is far more complicated than just doing

// OldHook needs an OldObj
$oldObj = new OldObj( ... );
$this->hookRunner->onOldHook( $oldObj );

where it's needed.
If the equivalent of "new OldObj" is actually expensive enough to care, wrap the whole thing in if ( $this->hookContainer->isRegistered( 'OldHook' ) ) {.

It's not just about being expensive. Construction of legacy objects often relies on global state and/or drags in unwanted dependencies on kitchen sink classes. The adapter approach would avoid that.

Having support for "automatic" adapters doesn't mean they always have to be used. You can of course still call both hooks, keeping simply cases simple. I just want a way out of the pain that this is currently causing.

Adapters aren't needed for this. The HookRunner's implementation of onSomeOldThing() could set $options['deprecatedVersion'] or the like regardless of any adapter.

True.

and reduces boiler plate code in the application logic.

Considering that most hooks have exactly one caller, this too seem unlikely to be much of a concern.

How many places we have that currently construct a legacy Revision object from a RevisionRecord so it can be passed to a hook? With boilerplate code for checking whether the hook is actually used, and all that.

With adapters, we'd have one adapter class per hook - but that "mess" would be located with the listener interface; we would be able to keep it out of the actual logic, making the code more readable.

In the best case you're running a bunch of extra code to register the adapter just before the hook is called to try to avoid running a probably-similar amount of code for just creating the object.

No, the caller doesn't register the adapter. The caller doesn't know or care about the adapter. Knowledge about adapters and aliases is solely in the wiring code that creates the HookContainer.

In the worst case you're running a bunch of code during setup to register all these adapters even if they're only rarely needed.

It's a single array. Much better than having all the boiler plate code sprinkled across the code base for legacy hooks that are rarely needed. The mess would be in once place, outside "normal" code.

As currently proposed, this isn't much of a problem. The listener class isn't loaded until a hook on it is actually called, so it doesn't matter if it would use an interface that doesn't exist when the hook is never called. So, as I said, the extension would merely have to split up its listeners across more than one class. Which is already proposed as a suggestion anyway for other reasons, see T240307#5729964 and T240307#5733085.

That need to split is what I was thinking of. I suppose the split can be per MW version, so not as problematic.

Let's have a concept of hook versions in core, and if an extension registers listeners for multiple versions of a hook, only the latest known one is called. HookContainer::getLegacyHandlers() and HookContainer::getListeners() can check for later versions of a hook within the same extension and filter out deprecated listeners.

I like that idea, but it's not entirely clear to me what that would look like in practice. Is it actual numbered versions of an interface, e.g. UserLoginComplete2, UserLoginComplete3, etc? Or just one hook superseding another?

We could still have a "filterable" flag in extension.json enabling this filtering behavior, allowing extensions to opt in and making it more obvious what is going on by reading the extension source code.

If we want to give the extension control over the filtering, perhaps the registration mechanism could support some kind of "register only if some other hook X isn't defined" or "register as a fallback for X".

The discussion above is mostly about handling legacy names hook versioning. While it's important to discuss these here so we don't paint ourselves into a corner, it seems to me like the proposal as stated is mostly good as it is. There are a few things I'd change:

  • Reserve the term "listener" for purely passive hook handlers, like the "action" handlers in T212482. Callbacks that can affect the caller aren't listeners in my understanding.
  • Clarify the relationship between hook names, class names, and method names. All three should represent the exact same thing, but they are defined independently, making inconsistencies likely. Since all three are globally unique, it would make sense to define a strict mapping between them: the class name must be the hook name plus a suffix that indicates the hook type, and the method name must be the hook name plus a prefix that indicates the hook type. E.g. the "UserLoginComplete" would use the UserLoginCompleteListener interface which declares the onUserLoginComplete() method. And the "UserCan" hook uses the UserCanFilter interface which declares the filterUserCan() method. The namespace would be determined by where the hook is called (see next point).
  • Define hook interfaces in the same namespace as their caller. Centralizing the hook interfaces in a single namespace inevitable leads to cyclic dependencies between namespaces: hooks depend on concepts declared in namespace X, and namespace X depends on the hook namespace.
  • Require hooks to be explicitly declared in the HookContainer (or HookRunner). Registering a handler for an unknown hook should cause an exception. This protects against typos in the registration, and allows registration to be conditional on which hook points are supported. Since defining a hook already requires boiler pate code to be added to HookRunner, the registration should happen close to that. Alternatively, HookContainer could check on registration if HookRunner has a method (and/or implements an interface) that matches the hook handler being registered. Might be expensive to do it this way, though.

and we do have a lot of extensions using the "master" compatibility policy

I was curious, so I checked the policies in Wikimedia-deployed extensions.

  • 15 are "master".
  • 53 are "release branches".
  • 118 don't specify a policy, including 4 which don't even have a page with {{Extension}} on mediawiki.org.

Let's have a concept of hook versions in core, and if an extension registers listeners for multiple versions of a hook, only the latest known one is called. HookContainer::getLegacyHandlers() and HookContainer::getListeners() can check for later versions of a hook within the same extension and filter out deprecated listeners.

I'll be interested to see more details as to how that's proposed to work. I'm a bit concerned it'll turn out to be a complex system trying to avoid a relatively rare problem. But if the solution is tightly scoped, maybe the complexity won't be as much as I fear.

The aliases would be a string to string map in HookContainer, with values for core hardcoded. I don't see how that would cause any classes to be loaded.

Sure, if you maintain a list of hook data completely separate from the actual interfaces, you could avoid having to reflect on the interfaces to find the aliases. But then you're adding yet another thing that need maintaining.

Better "onFooSomeName" than "onMediaWiki$Foo$SomeName" (or whatever mangling we choose for the backslashes).

If you already know onFooSomeName is unique, then how would the use of namespaces make the method name any longer?

If we allow the namespace to be significant, then the code in HookContainer that's building the method name to call based has to take it into account.

Unless you're proposing that instead of a cheap string manipulation to determine the method name, we have to use slow reflection to extract the method name from the interface? Or just adding more global data to your global list of every known hook, I suppose.

It's not just about being expensive. Construction of legacy objects often relies on global state and/or drags in unwanted dependencies on kitchen sink classes. The adapter approach would avoid that.

The adapter approach wouldn't avoid it, it would just hide it inside the adapter.

Having support for "automatic" adapters doesn't mean they always have to be used. You can of course still call both hooks, keeping simply cases simple. I just want a way out of the pain that this is currently causing.

Sounds very much like you're getting to the point of YAGNI now.

How many places we have that currently construct a legacy Revision object from a RevisionRecord so it can be passed to a hook? With boilerplate code for checking whether the hook is actually used, and all that.
With adapters, we'd have one adapter class per hook - but that "mess" would be located with the listener interface; we would be able to keep it out of the actual logic, making the code more readable.

"We have a dozen legacy hooks, each called from one place. Much better to have a dozen adapter classes instead!" All the same boilerplate, plus added infrastructure, and it's more work to find where the legacy hook is actually being called from too.

Personally, I find code more readable when it's not scattered all about the codebase with registrations of handlers elsewhere causing action at a distance.

In the best case you're running a bunch of extra code to register the adapter just before the hook is called to try to avoid running a probably-similar amount of code for just creating the object.

No, the caller doesn't register the adapter. The caller doesn't know or care about the adapter. Knowledge about adapters and aliases is solely in the wiring code that creates the HookContainer.

I had expected you'd be proposing the worst case.

In the worst case you're running a bunch of code during setup to register all these adapters even if they're only rarely needed.

It's a single array. Much better than having all the boiler plate code sprinkled across the code base for legacy hooks that are rarely needed. The mess would be in once place, outside "normal" code.

We seem to differ on this point. I like the code affecting a thing to be proximate to the thing (where it can be easily found when looking at the thing), while you prefer action at a distance to try to hide the messiness.

And the "UserCan" hook uses the UserCanFilter interface which declares the filterUserCan() method.

What's the advantage in having different naming conventions for different kinds of hooks? It seems extra complexity (specifying or determining the "kind" to determine which prefix to use) for no clear benefit.

  • Define hook interfaces in the same namespace as their caller. Centralizing the hook interfaces in a single namespace inevitable leads to cyclic dependencies between namespaces: hooks depend on concepts declared in namespace X, and namespace X depends on the hook namespace.

Personally I'm more concerned with actual dependencies than with imagined dependencies between unrelated things that happen to be in the same namespace.

On the other hand, keeping the hook interfaces associated with the code that calls them (by placing them in the same directory, which in PSR-4 means the same namespace) has distinct benefits that may outweigh the benefits of having all the hook interfaces together for ease of reference. Particularly since we can't really do "together" anyway once extensions are involved. So I'm fine with the bullet, just not the reasoning used.

  • Require hooks to be explicitly declared in the HookContainer (or HookRunner). Registering a handler for an unknown hook should cause an exception. This [...] allows registration to be conditional on which hook points are supported.

It doesn't allow registration to be conditional, it demands it. It requires using extension.json's "callback" with tests for which hooks are actually available in the specific version of MW+extensions being run, or else requires a micro-language in extension.json to specify which hooks to register with which specific versions of MW and/or other extensions.

Since defining a hook already requires boiler pate code to be added to HookRunner, the registration should happen close to that.

This makes little sense to me.

  • "HookRunner" in this proposal isn't just one thing. Even if we have only one in core, each extension would still have its own. But if we're putting the hook interfaces in the namespace with their callers, it would probably make more sense to similarly have a HookRunner per namespace as well instead of one HookRunner trying to do everything.
  • The code in HookRunner is a method definition on a class. Nothing is actually run there during setup (ideally HookRunner.php isn't even loaded during setup).

The best you might do is have the "registration" be some static array on a HookRunner class, and then somehow register all the HookRunners with something else so setup can load them all to collect all their arrays.

Alternatively, HookContainer could check on registration if HookRunner has a method (and/or implements an interface) that matches the hook handler being registered. Might be expensive to do it this way, though.

Again, there's more than one HookRunner.

That probably would be expensive. We should avoid unnecessarily loading every hook-interface's PHP file or doing a bunch of reflection during setup.

// ... many interfaces would be added here ...

That would be maybe hundreds of interfaces, do we really should do it? I don't have a better solution though given that I like everything else about the proposal.

and we do have a lot of extensions using the "master" compatibility policy

I was curious, so I checked the policies in Wikimedia-deployed extensions.

  • 15 are "master".
  • 53 are "release branches".
  • 118 don't specify a policy, including 4 which don't even have a page with {{Extension}} on mediawiki.org.

I'll go fix those 118 to "release branches" if you sling me the list. ;-)

More generally, however, Tim is right; many of the hundreds of extensions not deployed on Wikimedia seem to (at least attempt to) operate on a "master" compatibility basis, however much we try to discourage it. Not sure how much TechCom/etc. would like to use this opportunity to more seriously discourage this practice.

That would be maybe hundreds of interfaces, do we really should do it? I don't have a better solution though given that I like everything else about the proposal.

I think it's worth it. While the number of files may look big, but we are basically adding 5 lines of declaration and 5 lines of documentation for each hook, which was previously missing. Also, you'd get a nice overview of hooks used in a module just by looking at the files in the corresponding directory (if we put the interfaces with the callers).

The boilerplate in the HookRunner is a bit annoying, but no worse than MediaWikiServies.

I'll go fix those 118 to "release branches" if you sling me the list. ;-)

I recreated it just for you. P9944

More generally, however, Tim is right; many of the hundreds of extensions not deployed on Wikimedia seem to (at least attempt to) operate on a "master" compatibility basis, however much we try to discourage it. Not sure how much TechCom/etc. would like to use this opportunity to more seriously discourage this practice.

The extra boilerplate in T240307#5746862 for extensions wanting to use "master" doesn't seem like much to me. They already need similar code for detecting added methods and such to classes.

This reminds me a lot of drupal/core-plugin and their Plugin API which was implemented for many of the same reasons.

Sounds like a good idea to me! :)

One thing I don't quite see is why we still need hooks names, other than the names of the associated interfaces. This seems redundant and prone to inconsistencies. For legacy hook, there could be an aliasing mechanism by which extra names for a hook can be registered, which then get mapped to the name of an interface.

This has basically already been said, but: the method name is the hook name with "on" prepended. HookContainer only needs the method name, it does not need to know the interface name. The reason for passing the hook name to HookContainer is so that it is able to figure out the right method to call.

Maybe I missed it, but I have not seen a positive reason from Daniel as to why we should use interface names instead of method names, except that retaining hook names "seems redundant and prone to inconsistencies", which I don't agree with. The discussion seems to consist of Daniel telling Anomie why it is not quite as bad as we might think, rather than why it is a good idea.

I like that idea, but it's not entirely clear to me what that would look like in practice. Is it actual numbered versions of an interface, e.g. UserLoginComplete2, UserLoginComplete3, etc? Or just one hook superseding another?

These are not mutually exclusive, but the latter is closer to the truth. The idea is that core would know the complete deprecation chain. That might be UserLoginComplete -> UserLoginComplete2 -> UserLoginComplete3, or it might be ArticleSaveComplete -> PageContentSaveComplete. The important thing is that every deprecation is supported by core, to avoid the need for interface_exists() etc. It's not important that the hooks be exactly equivalent.

Let's call it a hook deprecation system rather than a hook versioning system.

If we want to give the extension control over the filtering, perhaps the registration mechanism could support some kind of "register only if some other hook X isn't defined" or "register as a fallback for X".

There's no concept of a hook being defined, so to introduce such conditional syntax, we would have to introduce a system for registering hooks. I see you're proposing that below, but in my proposal, only deprecated hooks need to be registered.

  • Require hooks to be explicitly declared in the HookContainer (or HookRunner). Registering a handler for an unknown hook should cause an exception.

This lacks forwards compatibility. Registering a handler for an unknown hook should be allowed, for forwards compatibility.

  • Reserve the term "listener" for purely passive hook handlers, like the "action" handlers in T212482. Callbacks that can affect the caller aren't listeners in my understanding.

They can just be called hook interfaces. This project will not include reviewing the code of all ~680 hooks and judging how passive they are.

  • Define hook interfaces in the same namespace as their caller. Centralizing the hook interfaces in a single namespace inevitable leads to cyclic dependencies between namespaces: hooks depend on concepts declared in namespace X, and namespace X depends on the hook namespace.

I think this is possible. Note that out of 203 files in includes/ with Hooks::run() in them, only 25 of them are in a namespace. You're talking about adding maybe 500 interfaces to the root namespace. Your complaint about cyclic dependencies between namespaces is theoretical at this point.

The namespaces will all move when T166010 is implemented. We could do T166010 first, I guess.

  • Clarify the relationship between hook names, class names, and method names. All three should represent the exact same thing, but they are defined independently, making inconsistencies likely. Since all three are globally unique, it would make sense to define a strict mapping between them: the class name must be the hook name plus a suffix that indicates the hook type, and the method name must be the hook name plus a prefix that indicates the hook type. E.g. the "UserLoginComplete" would use the UserLoginCompleteListener interface which declares the onUserLoginComplete() method. And the "UserCan" hook uses the UserCanFilter interface which declares the filterUserCan() method. The namespace would be determined by where the hook is called (see next point).

I will clarify the current proposal. I don't agree with having a strict mapping between the interface name and the hook name. In this proposal, interface names can freely change, say to support renamespacing, as long as a b/c alias remains. There will need to be a clear and simple policy to derive the interface name from the hook name, because this work will be completed by juniors or contractors. As a compromise, I would suggest: take the hook name, prepend the caller namespace, and append "Hook". So the interface for UserLoginComplete becomes UserLoginCompleteHook (in the root namespace). PageContentSave by the same rules will have the interface MediaWiki\Storage\PageContentSaveHook.

tstarling updated the task description. (Show Details)Jan 8 2020, 4:49 AM

Task description edit: remove the term "listener" and move hook interfaces into the namespace of the caller.

tstarling updated the task description. (Show Details)Jan 8 2020, 5:07 AM

Task description edit: I came up with a pretty neat and simple hook deprecation system, slightly different to what I proposed in the comment above. See what you think. It doesn't need core to be aware of the deprecation chain, it just needs a list of deprecated hooks.

Anomie added a comment.Jan 8 2020, 4:31 PM

Task description edit: I came up with a pretty neat and simple hook deprecation system, slightly different to what I proposed in the comment above. See what you think. It doesn't need core to be aware of the deprecation chain, it just needs a list of deprecated hooks.

I like it. Simple and straightforward while still covering the necessary use cases.

daniel added a comment.Jan 8 2020, 5:55 PM

Task description edit: remove the term "listener" and move hook interfaces into the namespace of the caller.

Thanks. My intention was to keep the hook's definition close to the caller - that primarily means in the same directory, the namespace is really secondary.

I don't consider the definition of the hook interface to be bolierplate code, by the way. All hook interfaces are different, there is nothing there that could be auto-generated or copied.

daniel added a comment.EditedJan 8 2020, 5:56 PM

I like it. Simple and straightforward while still covering the necessary use cases.

+1

Anomie added a comment.Jan 8 2020, 6:35 PM

I don't consider the definition of the hook interface to be bolierplate code, by the way. All hook interfaces are different, there is nothing there that could be auto-generated or copied.

My impression was that "boilerplate code" referred to the contents of HookRunner.php, not the interface definitions. The majority would look like

/** @inheritDoc */
function onSomething( $some, $args, $here ) {
    return $this->hookContainer->run( 'Something', [ $some, $args, $here ] );
}

Some would also pass a value for $options.

This RFC has been put on Last Call per the TechCom meeting on January 8. If there are no issues left unresolved by January 22, this RFC will be approved as proposed.

Eevans triaged this task as Medium priority.Jan 10 2020, 5:34 PM

This RFC has been approved as proposed per today's TechCom meeting

Change 571297 had a related patch set uploaded (by Nikki Nikkhoui; owner: Nikki Nikkhoui):
[mediawiki/core@master] Hook Container

https://gerrit.wikimedia.org/r/571297

Change 574266 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] [WIP] Automatically generated hook interfaces

https://gerrit.wikimedia.org/r/574266

Change 574915 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/tools/generateHookInterfaces@master] A script for generating hook interfaces per T240307 (initial commit)

https://gerrit.wikimedia.org/r/574915

Change 575408 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Use array literals when calling Hooks::run()

https://gerrit.wikimedia.org/r/575408

Change 575408 merged by jenkins-bot:
[mediawiki/core@master] Use array literals when calling Hooks::run()

https://gerrit.wikimedia.org/r/575408

Change 577414 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] [WIP] Automatically generated HookRunner

https://gerrit.wikimedia.org/r/577414