Page MenuHomePhabricator

[SPIKE] PoC to implement an example pipeline for bringing data into MediaWiki
Closed, ResolvedPublic

Description

We want to implement a maintenance script that consumes delete events (using the page_change jsonschema) from stdin, deserializes them into PageDeletedEvents, and dispatches them.

@daniel has some PoC code we can piggyback on.

Some questions we want to answer:

  • Is it possible to de-serialize events produced by EventBus into PHP? What are tradeoff wrt cross-language support?
  • What kind of Domain Event metadata would we need to extend (or wrap around) page_change?
  • How do we register the producer?
  • How would events be dispatched to consumers?
  • What would Kafka integration look like? Do we need Kafka consumer code at all?

Using stdin as the input source keeps us open to solutions orthogonal to a PHP Kafka client (e.g., approaches similar to Mercurius).

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Add asciicastrepos/data-engineering/php-kafka-consumer!1gmodenaadd-asciicastmain
Customize query in GitLab

Event Timeline

gmodena renamed this task from [SPIKE] PoC to implement an example pipeline for bringing data into MW to [SPIKE] PoC to implement an example pipeline for bringing data into MediaWiki.Jun 13 2025, 6:07 PM
gmodena updated the task description. (Show Details)

Change #1166207 had a related patch set uploaded (by Gmodena; author: Gmodena):

[mediawiki/core@master] maintenance: inbound domain event demo

https://gerrit.wikimedia.org/r/1166207