Page MenuHomePhabricator

Port GrowthExperiments to PageUpdated event
Open, HighPublic

Description

GrowthExperiments currently uses the PageSaveComplete for several features, often triggering a DeferredUpdate in the handler. Port this to the new event mechanism, listening to the PageUpdated event.

Note that the new event system is still experimental. Using it in this extension will help us understand if the new design is useful, and flush out any subtle bugs, as part of the WE5.2.3 hypothesis.

Event Timeline

daniel removed aaron as the assignee of this task.Nov 15 2024, 8:47 AM
daniel added a subscriber: aaron.
daniel triaged this task as High priority.Nov 15 2024, 8:50 AM
Ladsgroup renamed this task from Port GrowthExperiemtns to PageUpdated event to Port GrowthExperiments to PageUpdated event.Nov 15 2024, 12:20 PM

Summary of a conversation we had on Slack:

Michael:
As feedback: looking at the Linter patch, I'm very surprised that the new event handler does not implement an event-specific interface like the removed hook handler did. If find those interfaces very useful for finding documentation, ensuring absence of typos, finding other implementations, etc.

Daniel:
We could add such an interface, but we'd have to have multiple (one per dispatch mode), which seems less than ideal.The idea is that the event class itself covers the needs you describe: the parameter list with type hints and documentation is in the constructor signature of the event object. The semantics would be described in the doc block of the event class.

Finding other implementations would work similarly: you look for other methods that take the same type of event as a parameter.

Michael:
Is this still a TODO for PageUpdatedEvent? Because from looking at it, I'm not clear what options I have for subscribing to that event

Daniel:
Yes, the docs for PageUpdated are very much a TODO. We didn't write them since we were still shifting things around.

This is definitely something to get used to, and we need to document it. We can still add the interfaces - but that means that developers who define new events have to create the event class, and multiple interfaces (PageUpdatedAfterCommitListener, PageUpdatedDuringUpdateListener, PageUpdatedInJobListener, etc)

Michael:
Yea, it is sadly not possible in PHP to make an interface that says "A class must implement ONE of the following methods: ..."

Daniel:
It's indeed a bit unclear how to best facilitate the discovery of the listener method name (e.g. handlepageUpdatedEventAfterCommit). We could mention it in the docs, but it's really something that is implemented in EventSubscriberBase. It's a convenience of that base class, not a property of the system as such. If you implement DomainEventSubscriber::registerListeners() directly, you could provide whatever callback you want to DomainEventSource::registerListener().

Daniel:
Thank you for the feedback btw, it's valuable - I expect that many people will have the same question.
For now, I made a patch to improve the documentation of PageUpdatedEvent: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1097431

Michael wrote on Slack:

Thanks! Am I reading the EventSubscriberBase-class comment correctly, that for now only the handle{$eventType}EventAfterCommit mechanism is implemented and others will be added in the future?

Also, could the documentation somewhere be clearer on what AfterCommit means exactly? Intuitively, this could: after db-write but before response is being sent to the user, in a deferred update, in a job, ... I think you mentioned something about this being currently a deferred update under the hood, but I think it would be good to be explicit about what contract is intended here.

Yes, only the "AfterCommit" mode is currently available in core, suport for "DuringTransaction" is pending review, see https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1090555. Support for an "InJob" mode is planned for next quarter.

The fact that "AfterCommit" currently doesn't specify when exactly the listeners will be called is intentional - we want to give implementations as much freedoma as possible. But we also want to provide event consumers with as much guarantees as they need. So tweaking this contract is an important function of this round of feedback. Perhaps it would be a good idea to specify that "AfterCommit" listsners will be invoked within the same process/request that triggered the event.

Maybe we need to be event more specific - currently, "AfterCommit" listeners are called after the response has been sent. But perhaps we want to change that and make "AfterCommit" mean "pre-send", and add a separate "AfterResponse" mode. Would that distinction be important for your use case?

Thank you for elaborating!

Maybe we need to be event more specific - currently, "AfterCommit" listeners are called after the response has been sent. But perhaps we want to change that and make "AfterCommit" mean "pre-send", and add a separate "AfterResponse" mode. Would that distinction be important for your use case?

I'm not sure. To me, figuring that out is part of the work that will need to go into the "Investigate ..." subtasks of this task. Probably, that distinction will not be important, but maybe there is some subtlety in one of these hooks that is not obvious during a superficial glance.

More generally, I'm thinking about work that might currently be happening in a BeforePageDisplayHookHandler that checks whether the current request has just saved an article and behaves differently due to that. Maybe that could also live in a handlePageUpdatedEventAfterResponse event handler? But this is basically just a random thought at this point. (Do we even return the new page in the same request when the user clicked "save" or do we redirect them to the newly saved page?)
Anyway, that's just a brainstormed example for how that distinction might in principle be useful.

I'm not sure. To me, figuring that out is part of the work that will need to go into the "Investigate ..." subtasks of this task. Probably, that distinction will not be important, but maybe there is some subtlety in one of these hooks that is not obvious during a superficial glance.

It will be very valuable to figure that out!

More generally, I'm thinking about work that might currently be happening in a BeforePageDisplayHookHandler that checks whether the current request has just saved an article and behaves differently due to that. Maybe that could also live in a handlePageUpdatedEventAfterResponse event handler?

BeforePageDisplay is designed to manipulate the OutputPage and Skin objects that get passed as parameters. because of this, it cannot be replaced by a nevent - events are read only, they have no back-chanel. Code that needs to impact the response in any way cannot live in an event listener.

But this is basically just a random thought at this point. (Do we even return the new page in the same request when the user clicked "save" or do we redirect them to the newly saved page?)

When using the traditional editor, we redirect from action=submit back to the canonical page URL. When using VisualEditor, we do neither. We just replace the page's content area on the fly, without reloading the page at all.

@daniel: One question that I noticed:

What would be a good convention for naming the directory that holds the event subscribers?

  • DomainEvents (that is a term from DDD, but here we mean something different)
  • DomainEventSubscribers
  • MediawikiEvents (But could also be events from other extensions or skins)
  • MediawikiDomainEventSubscribers (But could also be events from other extensions or skins)
  • MediawikiEventSubscribers (because these are Mediawiki-Events? Currently implemented.)
  • Events (Could be confused with other GE events or Echo-events)
  • ...

I'm noticing that I'm feeling particularly uneasy about DomainEvents because of its DDD connection. And if we ever want to migrate GE more to a DDD-based structure with explicit Bounded Contexts, then we might also have a very specific thing called DomainEvents that would collide with these events here.

See also: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/1100778/comment/3ee49412_abeff042/

What would be a good convention for naming the directory that holds the event subscribers?

I don't think there should be such a directory. grouping things by pattern is an anti-pattern in my mind. Instead, you would have one subscriber class implementing all the listeners needed for a given component sit in the (base) directory/namespace of that component. In core, we are currently calling them "component ingress objects", e.g. MediaWiki\Languages\LanguageEventIngress or MediaWiki\Search\SearchEventIngress. extensions typically defined only one component (GE is the exception) - so in the Linter extension, we put the LintSubscriber next to the Hooks class. ContentTranslation did a similer thing, introducing SignificantEditSubscriber. In simple cases, Events could be an ok name, analogous to Hooks.

In the case of GE, I'd expect separate subscriber classes next to where you have your different hook handlers, and named in a similar way.

DomainEvents (that is a term from DDD, but here we mean something different)

How is it different? The intent was to model the thing that DDD describes... Compare e.g. https://learn.microsoft.com/en-us/dotnet/architecture/microservices/microservice-ddd-cqrs-patterns/domain-events-design-implementation#domain-events-versus-integration-events.

I'm noticing that I'm feeling particularly uneasy about DomainEvents because of its DDD connection. And if we ever want to migrate GE more to a DDD-based structure with explicit Bounded Contexts, then we might also have a very specific thing called DomainEvents that would collide with these events here.

Hm... how would the concept that you plan to introduce be incompatible with what we have put into core now? Understanding that would be very valuable!

What would be a good convention for naming the directory that holds the event subscribers?

I don't think there should be such a directory. grouping things by pattern is an anti-pattern in my mind. Instead, you would have one subscriber class implementing all the listeners needed for a given component sit in the (base) directory/namespace of that component. In core, we are currently calling them "component ingress objects", e.g. MediaWiki\Languages\LanguageEventIngress or MediaWiki\Search\SearchEventIngress. extensions typically defined only one component (GE is the exception) - so in the Linter extension, we put the LintSubscriber next to the Hooks class. ContentTranslation did a similer thing, introducing SignificantEditSubscriber. In simple cases, Events could be an ok name, analogous to Hooks.

In the case of GE, I'd expect separate subscriber classes next to where you have your different hook handlers, and named in a similar way.

I agree that we want, on the top-level, to group subscribers by component/bounded context. However within a component, I understand that the current guidance/best practice is to create a separate handler of each hook:

However, care should be taken with this feature. Some services have expensive constructors, so requesting them when handling commonly-called hooks may damage performance. Also, some services may not be safe to construct from within a hook call.

The safest pattern for service injection is to use a separate handler for each hook, and to inject only the services needed by that hook.

Calling a hook with the noServices option disables service injection. If a handler for such a hook specifies services, an exception will be thrown when the hook is called.

To me, that line of reasoning makes sense for events as well. Thus, wondering how to name that directory within a component that holds the various handlers.
What do you think?

DomainEvents (that is a term from DDD, but here we mean something different)

How is it different? The intent was to model the thing that DDD describes... Compare e.g. https://learn.microsoft.com/en-us/dotnet/architecture/microservices/microservice-ddd-cqrs-patterns/domain-events-design-implementation#domain-events-versus-integration-events.

I'm noticing that I'm feeling particularly uneasy about DomainEvents because of its DDD connection. And if we ever want to migrate GE more to a DDD-based structure with explicit Bounded Contexts, then we might also have a very specific thing called DomainEvents that would collide with these events here.

Hm... how would the concept that you plan to introduce be incompatible with what we have put into core now? Understanding that would be very valuable!

To me, after reading that Microsoft link and also others, Domain Events seem to be around communication within the domain of one Bounded Context, not about communication across them. So, if Growth, hypothetically, were to model Mentorship as a Bounded Context, then this might have its own set of internal Domain Events, but they would be distinct from core's domain that encompasses the ContentModel of a page changing. Obviously, we still need the information that an edit has occurred because we're reacting to it, I'm mainly trying to wrap my head around the semantics.

That being said, I have yet to find a good discussion about how Bounded Contexts are supposed to communicate with each other. That Microsoft-link is all about microservices, which is not very applicable for us. In a DDD world, how should various Bounded Contexts within the same monolith communicate with each other?

The safest pattern for service injection is to use a separate handler for each hook, and to inject only the services needed by that hook.

Calling a hook with the noServices option disables service injection. If a handler for such a hook specifies services, an exception will be thrown when the hook is called.

To me, that line of reasoning makes sense for events as well. Thus, wondering how to name that directory within a component that holds the various handlers.
What do you think?

I think that this advice only applies in the most extreme cases. Typically, the hook handlers and event handlers for a given component will need a siumilar set of services. I'd only split into multiple classes if there are a great many handlers, or they have many and distinct sets of servies. Or if there is a hook that should run a in a specifically low latency environment where service initialization should be avoided (e.g. resource loader).

In general, I would opt to wrap related code in a single class. This is also the common pattern for hook handler objects in extensions.

To me, after reading that Microsoft link and also others, Domain Events seem to be around communication within the domain of one Bounded Context, not about communication across them.

My understanding is that they are used for both, though some sources distinguish between the domain event as an event in the world, and an integration event between software components.

So, if Growth, hypothetically, were to model Mentorship as a Bounded Context, then this might have its own set of internal Domain Events, but they would be distinct from core's domain that encompasses the ContentModel of a page changing.

The intent for the DomainEvent base class is to server as a basis for the event models in all domains. There is no "mediawiki core" domain - core will end up defining a handfull of domains, each exposing events, all of them extending DomainEvent.

I have not so far considered the case of a component that wants to use events internally, but doesn't want to allow other components to listen to these events. I suppose you could tag an event class as @internal to get that effect.

I don't see a probelm with having MentorshipAssignedEvent extends DomainEvent alongside PageUpdatedEvent extends DomainEvent, and using the same dispatcher to send them abd the same subscriber to receive them... Where do you see the problem in semantics?

That being said, I have yet to find a good discussion about how Bounded Contexts are supposed to communicate with each other. That Microsoft-link is all about microservices, which is not very applicable for us. In a DDD world, how should various Bounded Contexts within the same monolith communicate with each other?

I tend to transalte "service" to "component" on my head. The same patterns apply. One component listens to events emitted from the other component, which model state changes (outcomes of commands) in that other component.

The safest pattern for service injection is to use a separate handler for each hook, and to inject only the services needed by that hook.

Calling a hook with the noServices option disables service injection. If a handler for such a hook specifies services, an exception will be thrown when the hook is called.

To me, that line of reasoning makes sense for events as well. Thus, wondering how to name that directory within a component that holds the various handlers.
What do you think?

I think that this advice only applies in the most extreme cases. Typically, the hook handlers and event handlers for a given component will need a siumilar set of services. I'd only split into multiple classes if there are a great many handlers, or they have many and distinct sets of servies. Or if there is a hook that should run a in a specifically low latency environment where service initialization should be avoided (e.g. resource loader).

In general, I would opt to wrap related code in a single class. This is also the common pattern for hook handler objects in extensions.

My concern is that these generic HookHandler classes tend to grow way too large for large features. GE's HomepageHooks is over 1600 lines long, has 100 use ...; statements, and has 22 services injected and who knows how many more statically accessed in various hook handlers via $mwServices = MediaWikiServices::getInstance();. Extracting some private methods for readability and separation of concerns makes it even worse.

I'm not opposed to collecting closely related hooks into a shared class, that can make sense. But I'm not sure if it is a good default pattern for all but very simple features.

To me, after reading that Microsoft link and also others, Domain Events seem to be around communication within the domain of one Bounded Context, not about communication across them.

My understanding is that they are used for both, though some sources distinguish between the domain event as an event in the world, and an integration event between software components.

So, if Growth, hypothetically, were to model Mentorship as a Bounded Context, then this might have its own set of internal Domain Events, but they would be distinct from core's domain that encompasses the ContentModel of a page changing.

The intent for the DomainEvent base class is to server as a basis for the event models in all domains. There is no "mediawiki core" domain - core will end up defining a handfull of domains, each exposing events, all of them extending DomainEvent.

I have not so far considered the case of a component that wants to use events internally, but doesn't want to allow other components to listen to these events. I suppose you could tag an event class as @internal to get that effect.

I don't see a probelm with having MentorshipAssignedEvent extends DomainEvent alongside PageUpdatedEvent extends DomainEvent, and using the same dispatcher to send them abd the same subscriber to receive them... Where do you see the problem in semantics?

I think my main concern is the various domain models leaking all over the place. If I try to square Domain Driven Design with hexagonal/onion/clean architecture, then I'm concerned that we get dependencies on domain objects from a different context or even subdomain. But maybe the right way forward here is to be pragmatic and accept that, at least for now, we have a very sizable and unavoidable shared kernel.

That being said, I have yet to find a good discussion about how Bounded Contexts are supposed to communicate with each other. That Microsoft-link is all about microservices, which is not very applicable for us. In a DDD world, how should various Bounded Contexts within the same monolith communicate with each other?

I tend to transalte "service" to "component" on my head. The same patterns apply. One component listens to events emitted from the other component, which model state changes (outcomes of commands) in that other component.

Mh. That seems reasonable, if I accept the above, and is probably also the more immediate useful approach. For example, CommunityConfiguration published an event about how some configuration changed and GrowthExperiments listens to it and then schedules a job to incrementally invalidate potentially outdated link-recommendations. Technically, that would already be possible with the PageUpdated event, but it would mean inspecting every edit for "Is this about the one single page that I care about?", which feels less ideal.

My concern is that these generic HookHandler classes tend to grow way too large for large features. GE's HomepageHooks is over 1600 lines long, has 100 use ...; statements, and has 22 services injected and who knows how many more statically accessed in various hook handlers via $mwServices = MediaWikiServices::getInstance();. Extracting some private methods for readability and separation of concerns makes it even worse.

Instead of private methods on the same class, I'd recommend factoring out into service objects that get injected, or helper objects that get composed.

I'm not opposed to collecting closely related hooks into a shared class, that can make sense. But I'm not sure if it is a good default pattern for all but very simple features.

Yes, of course. If things grow too big, split them.

I think my main concern is the various domain models leaking all over the place. If I try to square Domain Driven Design with hexagonal/onion/clean architecture, then I'm concerned that we get dependencies on domain objects from a different context or even subdomain. But maybe the right way forward here is to be pragmatic and accept that, at least for now, we have a very sizable and unavoidable shared kernel.

I'm only superficially familliar with hexagonal architectures. I think of subscribers as input adapters that know about events emitted by other components, and isolate the rest of the component from knowledge about these events. They act as an "ingress" to the component.

Mh. That seems reasonable, if I accept the above, and is probably also the more immediate useful approach. For example, CommunityConfiguration published an event about how some configuration changed and GrowthExperiments listens to it and then schedules a job to incrementally invalidate potentially outdated link-recommendations.

Yes, that's how I imagine this should work!