Taking very TDD approach here see if with hudi we can bridge two mediawiki snapshots for simplewiki when it comes to page reverst for example, this means we are going to use one of the event feeds and try to rebuild reverts with incoming data.
Description
Description
Details
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
[WIP] Spike so far | analytics/refinery/source | master | +112 -0 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | None | T258511 Data Lake incremental Data Updates | |||
Open | None | T231938 Get "edits hourly" on a daily basis | |||
Resolved | Milimetric | T258532 [SPIKE] Prototype of incremental updates for mediawiki history for simplewiki , including reverts using apache hudi | |||
Declined | None | T262205 Need for new event-type - `user_create` and `user_rename` | |||
Resolved | JAllemandou | T262256 Test hudi and Iceberg as an incremental update system using 2 mediawiki-history snapshots | |||
Declined | None | T262260 Make hudi work with Hive | |||
Resolved | JAllemandou | T262261 Check whether mediawiki production event data is equivalent to mediawiki-history data over a month | |||
Resolved | Milimetric | T215001 Revisions missing from mediawiki_revision_create | |||
Resolved | None | T280538 Capture rev_is_revert event data in a stream different than mediawiki.revision-create | |||
Declined | Milimetric | T263055 Add log entry details to page and user events in EventBus |
Event Timeline
Comment Actions
Change 618874 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[analytics/refinery/source@master] [WIP] Spike so far