T116786 introduced MediaWiki #EventBus production [[https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/EventBus|via an extension]] utilizing hooks. While adequate for the [[https://phabricator.wikimedia.org/T114443|EventBus MVP]], this is only an interim solution. Ultimately, we need a mechanism that guarantees event delivery (eventual consistency is OK).
The [[ https://wikitech.wikimedia.org/wiki/Event_Platform | Event Platform ]] extended the work started in T116786 to provide a standardized event producing APIs unified for both production and analytics purposes.
However, in order to build truly reliable new production services with events based on MediaWiki data, we need a single source of truth for MediaWiki data. That source of truth is the MediaWiki MySQL database. This is only consistently accessible by MediaWiki itself. The is currently no way to consistently expose (real time) MediaWiki data to non MediaWiki applications.
We do have events produced by MediaWiki, but these events are decoupled from the MySQL writes, and there is no guarantee that e.g. every revision table save results in a mediawiki.revision-create event. This means that as of today, MediaWiki events cannot be relied on as a 'source of truth' for MediaWiki data. They are not much more of a best (really good!) effort notification.
## Potential approaches
[[ https://martinfowler.com/eaaDev/EventSourcing.html | Event Sourcing ]] is an approach that event driven architectures use to ensure they have a single consistent source of truth that can be used to build many downstream applications. If we were building an application from scratch, this might be a great way to start. However, MediaWiki + MySQL already exist as our source of truth, and migrating it to an Event Sourced architecture all at once is intractable.
In lieu of completely re-architecting MediaWiki's data source, there are 2 possible approaches to solving this problem in a more incremental way.
---
### Change Data Capture (CDC)
CDC uses the MySQL replication binlog to produce state change events. This is the same source of data used to keep the read MySQL replicas up to date.
**Description**
A binlog reader such as [[ https://debezium.io/ | debezium ]] would produce database change events to Kafka. This reader may be able to transform the database change events into a more useful data model (e.g. [[ https://schema.wikimedia.org/repositories/primary/jsonschema/mediawiki/revision/create/latest | mediawiki/revision/create ]]), or transformation maybe done later by a Stream Processing framework such as [[ https://flink.apache.org/ | Flink ]] or [[ https://kafka.apache.org/documentation/streams/ | Kafka Streams ]].
**Pros**
* No (or minimal?) MediaWiki code changes needed
* Events are guaranteed to be produced for every database state change
* May be possible to guarantee each event is produced exactly once
* Would allow us to incrementally Event Source MediaWiki (if we wanted to)
**Cons**
* Events are emitted (by default?) in a low level database change model, instead of a higher level domain model, and need to be transformed by something
---
### Transactional Outbox
This is a hybrid method that makes use of database transactions and a separate poller process to produce events.
**Description**
Here's how this might work with the revision table:
When a revision is to be inserted into the MySQL `revision` table, a MySQL transaction is started. A record is inserted into both the `revision` table and the `revision_event_log` table. The MySQL transaction is committed. Since this is done in a transaction, we can be sure that both of the table writes happen atomically. The revision event is produced to Kafka. When the Kafka produce request succeeds, the `revision_event_log`'s `produced_at` timestamp (or boolean) field is set.
A separate process polls the `revision_event_log` table for records where `produced_at` is NULL, produces them to Kafka, and sets `produced_at` when the produce request succeeds.
If needed, `revision_event_log` records may be removed after they are successfully produced.
NOTE: This example is just one of various ways a Transactional Outbox might be implemented. The core idea is the use of MySQL transactions and a separate poller to ensure that all events are produced.
**Pros**
* Events can be emitted modeled as we choose
**Cons**
* Substantial MediaWiki code changes needed
* At least once guarantee for events, but this should be fine. There may be ways to easily detect a the duplicate event.
---
### 2 Phase Commit with Kafka Transactions
This may or may not be possible and requires more research if we want to consider it. Implementing it would likely be difficult and error prone, and could have an adverse affect on MediaWiki performance. If we do need Kafka Transactions, this might be impossible anyway, unless a good PHP Kafka Client is written.