Page MenuHomePhabricator

Testing the domain event refactoring with production data
Closed, ResolvedPublic

Description

While as part of 5.2.13 we added comprehensive integration and unit tests suites to EventBus, we want to collect and analyze
some payloads from the production environment before fully deprecating the hooks API.

Testing plan:

  • create a new mediawiki.page_change.staging.v1 stream
  • have EventBus produce Domain Event generated payloads to mediawiki.page_change.staging.v1 .
  • compare Hooks and Domain Event payloads (offline)
  • deprecate Hooks

Event Timeline

Change #1148818 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/mediawiki-config@master] EventStreamConfig: add staging page_change stream

https://gerrit.wikimedia.org/r/1148818

Change #1148818 merged by jenkins-bot:

[operations/mediawiki-config@master] EventStreamConfig: add staging page_change stream

https://gerrit.wikimedia.org/r/1148818

Mentioned in SAL (#wikimedia-operations) [2025-05-22T14:12:42Z] <gmodena@deploy1003> Started scap sync-world: Backport for [[gerrit:1148818|EventStreamConfig: add staging page_change stream (T394899)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-22T14:14:36Z] <gmodena@deploy1003> gmodena: Backport for [[gerrit:1148818|EventStreamConfig: add staging page_change stream (T394899)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-05-22T14:28:40Z] <gmodena@deploy1003> Finished scap sync-world: Backport for [[gerrit:1148818|EventStreamConfig: add staging page_change stream (T394899)]] (duration: 15m 58s)

Good so far:

  • field-by-field showed that the payloads generated by domain events sources match the hooks. The discrepancies we found are explainable (variable fields, more accurate data in the domain event payload).
  • cardinality: number of edits is matching (modulo a single digit number of spurious records per day)

Needs fixing:

  • we are mixing creation events for file uploads. The largest impacted wiki is `commonswiki. Patch incoming.
  • asymmetric differences of delete: while the total count of delete events is consistent between the two streams, there is a substantial set of events that are present in one stream but missing in the other. Investigating.

Change #1154106 had a related patch set uploaded (by Gmodena; author: Gmodena):

[mediawiki/extensions/EventBus@master] PageChangeEventIngress: handle file upload

https://gerrit.wikimedia.org/r/1154106

Change #1154232 had a related patch set uploaded (by Gmodena; author: Gmodena):

[mediawiki/extensions/EventBus@master] PageChangeEmissionTest: assert on id of deleted page

https://gerrit.wikimedia.org/r/1154232

  • asymmetric differences of delete: while the total count of delete events is consistent between the two streams, there is a substantial set of events that are present in one stream but missing in the other. Investigating.

The asymmetry comes from different page_ids logged for the same page_title and revision id.

Let's look at an example:

Page id= 7903659 was recorded as deleted in the domain event stream (event.mediawiki_page_change_staging_v1 ), but is missing from the hooks one (event.mediawiki_page_change_v1 ).

In the domain event streams we find:

> select year, month, day, page.page_title, page.page_id, revision.rev_id from event.mediawiki_page_change_staging_v1 WHERE year = 2025 AND month = 6 and page.page_id = 7903659 and revision.rev_id = 87248538 and wiki_id = 'enwiki' and changelog_kind = 'delete' and page_change_kind='delete'; 

year	month	day	page_title	page_id	rev_id
2025	6	3	Mouth_Music	7903659	87248538

There is a entry for revision 87248538 and page title Mouth_Music in the hooks stream:

> select year, month, day, page.page_title, page.page_id, revision.rev_id, changelog_kind, page_change_kind from event.mediawiki_page_change_v1 WHERE year = 2025 AND month = 6 and page.page_title = 'Mouth_Music';

year	month	day	page_title	page_id	rev_id	changelog_kind	page_change_kind
2025	6	3	Mouth_Music	7903659	1293791709	update	move
2025	6	3	Mouth_Music	7894112	87248538	delete	delete

This specific delete event was tied to a move action (we don't emit those yet in the domain events stream). I looked up revision 87248538
in the archive table; the delete record generated from the hook seems correct (same page_id):

mysql:research@s1-analytics-replica.eqiad.wmnet [enwiki]> select ar_page_id from archive where ar_rev_id = '87248538';

+------------+
| ar_page_id |
+------------+
|    7894112 |
+------------+
1 row in set (0.002 sec)

Back to implementation. I was able to replicate the page_id missmatch with an integration tests: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/1154232
The assertion tests that the id of a deleted patch matches the serzialed event:

  • it passes when the event is serialized from PageSaveCompleteHook handler;
  • it fails when the event is serialized from the Domain Event handler (case reported in the patch).

The EventSerializer requries a WikiPage. In the PageChangeHook we init one from the PageIdentity received from the hook callback:

...
$wikiPage = $this->wikiPageFactory->newFromTitle( $page );
...

In the domain event handler we instantiate one from the delete page PageIdentity retrieved. from the PageDeletedEvent:

$wikiPage = $this->wikiPageFactory->newFromTitle( $event->getDeletedPage() );

I assumed the code paths would behave the same way, but probably this is where things got messy.

In the domain event handler we instantiate one from the delete page PageIdentity retrieved. from the PageDeletedEvent:
I assumed the code paths would behave the same way, but probably this is where things got messy.

Indeed. $this->wikiPageFactory->newFromTitle() will give you a WikiPage based on the *title*, and after the page move, that title is now associated with a different page ID. You can't use $this->wikiPageFactory->newFromId() either - that will return null, because the page with that ID has been deleted.

The EventSerializer requries a WikiPage.

Looking at PageChangeEventSerializer and PageEventSerializer, it should be straight forward to replace the type hints for WikiPage with type hints for (Existing)PageRecord, or even (Proper)PageIdentity. It doesn't look like any of the extra functionality of WikiPage is actually used. Note that WikiPage is a PageRecord, and Pagerecord is a PageIdentity, so this should all be backwards compatible.

It would be nice to have a more light weight solution in the interim, but I can't think of any. But then, WikiPage should be replaced anyway, so this is as good a time as any :)

Looking at PageChangeEventSerializer and PageEventSerializer, it should be straight forward to replace the type hints for WikiPage with type hints for (Existing)PageRecord, or even (Proper)PageIdentity. It doesn't look like any of the extra functionality of WikiPage is actually used. Note that WikiPage is a PageRecord, and Pagerecord is a PageIdentity, so this should all be backwards compatible.

It would be nice to have a more light weight solution in the interim, but I can't think of any. But then, WikiPage should be replaced anyway, so this is as good a time as any :)

WkiPage has been a source of subtle bugs, and we were planning to replace it. So, this is indeed a good excuse :) .
I refactored PageChangeEventSerializer and the unit test in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/1154232 now passes.

Change #1155233 had a related patch set uploaded (by Gmodena; author: Gmodena):

[mediawiki/extensions/EventBus@master] PageChangeEventSerializer: serialize PageIdentity

https://gerrit.wikimedia.org/r/1155233

Change #1154232 abandoned by Gmodena:

[mediawiki/extensions/EventBus@master] PageChangeEmissionTest: assert on id of deleted page

Reason:

Superseded by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/1155233

https://gerrit.wikimedia.org/r/1154232

Change #1154106 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] PageChangeEventIngress: handle file upload

https://gerrit.wikimedia.org/r/1154106

Change #1155233 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] PageChangeEventSerializer: serialize PageIdentity

https://gerrit.wikimedia.org/r/1155233

Change #1163344 had a related patch set uploaded (by Gmodena; author: Gmodena):

[mediawiki/extensions/EventBus@master] WIP: Remove deprecated PageChangeHooks handlers

https://gerrit.wikimedia.org/r/1163344

Change #1163344 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] Remove PageChangeHooks handlers

https://gerrit.wikimedia.org/r/1163344

Change #1166827 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/mediawiki-config@master] EventStreamConfig: remove staging page change conf

https://gerrit.wikimedia.org/r/1166827