Page MenuHomePhabricator

Design and document new Domain Events feature in MediaWiki core
Open, MediumPublic

Description

An implementation of DomainEventDispatcher was added as part of T377229: Implement DomainEventDispatcher (baseline). Subsequent tickets add features and functionality to this system.

Since then, a design doc has been started and iterated on. This ticket will track that work, as well as provide a place for discussions and considerations of the design.

Event Timeline

I created this because I wanted a visible and semi-centralized place to discuss things. Slack is bad, and the design doc comment threads are a little difficult to follow up on. I hope that's okay!

@daniel should we consider https://event.thephpleague.com/3.0/ ? in your comparison?

Nevermind, looked into it more. Looks nice and simple, but doesn't do a lot more than a basic PSR-14.

Moving a conversation from the design doc here.

Sinks and Sources terminology

In the Design Doc Glossary as of 2024-11-14, we have:

  • EventSink: interface used by the producer to send events
  • EventSource: interface used by the consumer to register listeners

So, a 'sink' is events are produced to, and a 'source' is where listeners get their events from.

However https://en.wikipedia.org/wiki/Observer_pattern has

an object, named the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods
[...]
the subject is usually named a "stream of events" or "stream source of events" while the observers are called "sinks of events

According to our glossary, observer is synonymous with 'listener', 'subscriber' or 'handler'.

I've found our definition of Sink and Source hard to reason about. I can see why you made Sinks the place where events are produced to, but I think it depends on the perspective?

Suggestion: Get rid of the Source and Sink terms and use something else.

What are the Sink and Source interfaces for?

I'm trying to imagine a case where a Sink and a Source can be implemented standalone. How would that work? I suppose a Sink could keep an in memory queue of events? Or produce to Kafka?

I'm trying to suggest some different names for these, but its a bit confusing to me at the moment.

dispatch() makes sense, but DomainEventSink::dispatch() doesn't really.

DomainEventSource::registerListener() makes some sense. But as stated above, these concepts seem a little backwards to me, and without Sink, Source seems too lonely :)

Ideas

Maybe all we need is a Dispatcher and a ListenerRegistry?

interface DomainEventDispatcherInterface {
    public function dispatch(DomainEvent $event);
}

interface ListenerRegistryInterface {
   public function registerListener($channel, $listener, $mode);
   public function getListeners($channel, $mode);
}

// then implementation:
class DomainEventDispatcher implements DomainEventDispatcherInterface {
    // DomainEventDispatcher composes ListenerRegistryInterface
    public function __construct(ListenerRegistryInterface $listenerRegistry) { ... }

    public function dispatch($event) {
        // use $this->listenerRegistry to invoke listeners.
    }

}

This would allow you to wire Subscriber classes with a ListenerRegistry, instead of having to pass Subscriber classes the Dispatcher. (If a Subscriber class also wants to Dispatch events, they could wire in a Dispatcher too).

$eventType vs $channel

In the current implementation, listeners are registered by string $eventType, and DomainEvents are published to listeners by their getEventType method.

This effectively (at least by default) means that listeners can only subscribe based on PHP classes/data models. (I suppose a DomainEvent subclass could override getEventType to make this more dynamic...but that could get weird!)

I wonder if a channel concept in the interfaces will be useful. We can default to using $eventType as the $channel, but being able to set channel will allow us to do things like:

$dispatcher->dispatch($eventV1, “mediawiki.page_change.v1”)
$dispatcher->dispatch($eventV2, “mediawiki.page_change.v2”)

This would allow us to do progressive backwards incompatible migrations to new DomainEvent data models.

This is similar to the 'streams, not schemas' advice at https://wikitech.wikimedia.org/wiki/Event_Platform/EventLogging_legacy

Streams, not schemas
EventLogging schemas were single use. Each schema corresponded to only one instrumentation, and eventually only one downstream SQL table.
Event Platform schemas are like data types for a dataset. A realtime event data set is called an 'event stream' (or just 'stream' for shorthand). Each stream must specify its schema, and a schema may be used by multiple streams.

In the case of DomainEvent, the value class is a data model, and an instance of it is an event. We should make sure that we only produce backwards compatible data models to the same channel.

See also https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_versioning

daniel triaged this task as High priority.Fri, Nov 15, 8:50 AM
daniel lowered the priority of this task from High to Medium.

What are the Sink and Source interfaces for?

The reason for having separate interface is that callers typically only need one or the other, rarely both. Interfaces should be structured based on what is used together, not by what is implemented together. That's one of the SOLID principles of OOP design, see https://en.wikipedia.org/wiki/Interface_segregation_principle.
Besides that, I can see special cases where you would implement one without the other. For example, a faux DomainEventSource can be used in a phpunit test to just collect the names of events that a subscriber registers for, to compare that to the list of events listed in extension.json. Similarly, we might have a DomainEventSource for listening to remote events, where the "intake" of the dispatch logic has a form that is different from DomainEventSink, perhaps a polling mechanism implemented in a method like consume( $channel, $maxEvents ). Or we could have a DomainEventSink that dispatches to a bus or queue, and doesn't implement the EventSourceInterface. Or we could refactor and have the implementation of DomainEventSink wrap an implementation of DomainEventSource, so they can both be swapped out separately.

If the interfaces are separate, we have all these options. If it was a single interface, we couldn't (easily/cleanly) do these things.

This would allow you to wire Subscriber classes with a ListenerRegistry, instead of having to pass Subscriber classes the Dispatcher. (If a Subscriber class also wants to Dispatch events, they could wire in a Dispatcher too).

I suppose the ListenerRegistry would have a getListeners( $event ) method, like the ListenerProvider in PSR-14? If we suppot different dispatch types, levels of isolation, or priorities associated with each listener, where would they go?...

In the end, ListenerProvider::getListeners() and Subscriber::registerListeners( $source ) are equivalent. The latter seems slightly more flexible, so I prefer it. I saw this approach in Laravel, and immediately liked it.

I wonder if a channel concept in the interfaces will be useful. We can default to using $eventType as the $channel, but being able to set channel will allow us to do things like [...] progressive backwards incompatible migrations to new DomainEvent data models.

Whether we call them "event types" or "channels" is arbitrary. The question of hierarchical names and versioning is relevant, though. It's relevant (in different ways) to extensions and to broadcast/published events.

I like ht hierarchical names, but they make it harder to map event names to method names. They's get rather ugly: after_mediawiki_page_change_v1. Not terrible per-se, but in violation of our code sniffer rules, we'd need exceptions...

Right now, I'm thinking that versioning is mostly relevant for the schemas used for serialization and publication. I'd leave that, and the mapping of event types (and serializations) to channels, to some kind of EventPublisher or EventRelay component that plugs into the event dispatcher.

In the end, ListenerProvider::getListeners() and Subscriber::registerListeners( $source ) are equivalent.

You mean ListenerProvider::registerListeners() and Subscriber::registerListeners( $source ) are equivalent?

(^ ah sorry, I misread this. You meant the PSR-14 ListenerProvider::getListeners.)

I haven't looked at your Subscriber stuff in depth yet, sorry about that. But I thought Subscriber was just sugar to make it easier to register multiple listeners? You are registering with the DomainEventDispatcher (Source, name TBD currently). I think with ListenerRegistery', I'm just suggesting a different name for DomainEventSource.

I was playing with Subscriber as a name for this, but IIUC 'subscriber' (as we've defined it) is just sugar for grouping listeners, and a listener is the actual registered function, not thing thing that is registering the method. Subscriber is the class implementing the listener functions, right?

If we suppot different dispatch types, levels of isolation, or priorities associated with each listener, where would they go?...

I suppose on this ListenerRegistery? I agree, this is kind of like PSR-14 ListenerProvider. DomainEventSource looks to me like a registry implementation of a ListenerProvider. Maybe the ListenerProvider interface is a good idea, and we should adopt it?

Or, where were you thinking priorities and other features would go?

The reason for having separate interface is that callers typically only need one or the other, rarely both.

I see! So in ServiceWiring, you would have a 'DomainEventSource' (name currently under bikeshed) service that listeners would use to register? The DomainEventSource instance might be DomainEventDispatcher, but the consumer would not necessarily know that, they would only know they have some instance of a DomainEventSource? (And same for producer: they'd only know they have an instance of DomainEventSink (name TBD)?).

I like ht hierarchical names, but they make it harder to map event names to method names.

This is not a strong opinion: but the use of method names to automatically register a listener seems a little bit...inflexible? If I wanted to change the priority of the listener, or the mode of the listener, I'd have to add this configuration to the class somehow? Again, I'm not opposed to this, but I don't see a lot of value added for the added inflexibility. But! You know much more about how people will expect to use this, so I leave it to you all!

I'm thinking that versioning is mostly relevant for the schemas used for serialization and publication.

It will definitely be easier to not think about this now (too much bikeshedding!) But not having the ability to do so might bite us.

Whether we call them "event types" or "channels" is arbitrary.

I think its not arbitrary. An event 'type' to me is associated with the Type of the DomainEvent class...or maybe I am misunderstanding the usage of type here.

Using the event type as the 'api' name would be like using a REST API handler class name as the URI Path.

Having a more flexible 'channel' concept in the interfaces (or something like it) could leave us more open for changing this later.

I was playing with Subscriber as a name for this, but IIUC 'subscriber' (as we've defined it) is just sugar for grouping listeners, and a listener is the actual registered function, not thing thing that is registering the method. Subscriber is the class implementing the listener functions, right?

The subscriber registers listeners with an event souces (aka registry). These listeners would typically, but not necessarily, be methods implemented in the subscriber class.

Or, where were you thinking priorities and other features would go?

Yes. If the dispatcher (aka registry/source) calls a getListeners() method on a listener provider (aka subscriber), the value returned by that method scouldn't just be a list of callbacks. it would have to be a more complex structure that contains additional information, such as dispatch mode and priorities. That seems inconvenient.

So in ServiceWiring, you would have a 'DomainEventSource' (name currently under bikeshed) service that listeners would use to register?

Yes, that's how it's implemented in core already. MediaWikiServices now has getDomainEventSurce() and getDomainEventSink(), both returning the same DomaineventDispatcher instance.

This is not a strong opinion: but the use of method names to automatically register a listener seems a little bit...inflexible? If I wanted to change the priority of the listener, or the mode of the listener, I'd have to add this configuration to the class somehow?

It's just a convenience. If you want anything special, like priorities or multiple events sharing the same listener, you'd implement registerListeners() to do that, instead of relying on the default implementation that goes by method name.

It will definitely be easier to not think about this now (too much bikeshedding!) But not having the ability to do so might bite us.

We can always add version IDs to event names. PageUpdated47 isn't pretty, but perfectly doable (47 would be the MediaWiki release).

I think its not arbitrary. An event 'type' to me is associated with the Type of the DomainEvent class...or maybe I am misunderstanding the usage of type here.

It's associated, but not the same. Some frameworks use the actual class name as the event typpe, but I wouldn't want that. Too brittle.

However, a listener subscribing to a certain type of event must have a guarantee of the type of DomainEvent object it will receive, to allow for the proper type to be used in the method signature. That's a feature, not a bug...

Using the event type as the 'api' name would be like using a REST API handler class name as the URI Path.
Having a more flexible 'channel' concept in the interfaces (or something like it) could leave us more open for changing this later.

If we are talking about consumers outside MediaWiki, I absultely agree. That's why I want the ability to have an n:m mapping between event types and channels. Much more flexible and easier to deal with compatibility issues, different serializations, fat vs skinny events, merging events from mutliple wikis in a single channel (or not), etc.

a listener subscribing to a certain type of event must have a guarantee of the type of DomainEvent object it will receive, to allow for the proper type to be used in the method signature. That's a feature, not a bug...

Agree. Any given channel should always have the same type. I'm just arguing that a one to one mapping between channel and type might be inflexible.

But! Let's drop it, I think you have a better sense of this inside of MW, and I agree that it is more important for external consumers.

Change #1092772 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] DomainEventDispatcher: rename concepts

https://gerrit.wikimedia.org/r/1092772

Change #1092772 merged by jenkins-bot:

[mediawiki/core@master] DomainEventDispatcher: rename concepts

https://gerrit.wikimedia.org/r/1092772

In a recent slack thread we discussed some interface and implementation class names.

An idea:

Interfaces:

  • DomainEventDispatcher - used by event producers to produce events
  • DomainEventListenerRegistr(ar|y)[1] - used by event consumers to register listeners for events
  • DomainEventSubscriber - Uses a DomainEventListenerRegistr(ar|y) instance to automate registering listeners

ServiceWiring Implementation:

  • DomainEventBroker - Implements DomainEventDispatcher and DomainEventListenerRegistr(ar|y)

[1] DomainEventListenerRegistrar vs DomainEventListenerRegistry - either is fine, to be decided soon.

We already did soem of thes changes. The situation is now:

  • DomainEventDispatcher - used by event producers to produce events (previously: DomainEventSink)
  • DomainEventSource - used by event consumers to register listeners for events (unchanged - it's rarely referenced directly)
  • DomainEventSubscriber - Uses a DomainEventListenerRegistr(ar|y) DomainEventSource instance to automate registering listeners (unchnaged)

ServiceWiring Implementation:

  • EventDispatchEngine - Implements DomainEventDispatcher and DomainEventListenerRegistr(ar|y) DomainEventSource (Only used in wiring. Previously: DomainEventDispatcher)

DomainEventSubscriber - Uses a DomainEventListenerRegistr(ar|y) instance to automate registering listeners (unchnaged)

I think you mean "Uses a DomainEventSource..." ya?

This comment was removed by daniel.

DomainEventSubscriber - Uses a DomainEventListenerRegistr(ar|y) instance to automate registering listeners (unchnaged)

I think you mean "Uses a DomainEventSource..." ya?

yes, sorry.

FYI, I have consolidated the content of the original in-process dispatcher design doc into the larger Domain Event System. The WE 5.2.3 dispatcher design doc is now a tab of the main doc.

I have made a tab for "Queued Events" design for future phase(s).