Page MenuHomePhabricator

Spark Streaming Dumps POC: Backfill metadata table
Closed, ResolvedPublic

Description

For the purpose of this POC, the minimal schema can be: (wiki_db string, page_id bigint, revision_id bigint, revision_deleted_parts array<string>). It might be interesting to also include is_latest boolean to keep track of which revision is the latest for a page and see how fast updates to that work in iceberg with our volume.

Event Timeline

Milimetric moved this task from Next Up to In Progress on the Event-Platform (Sprint 05) board.
Milimetric updated the task description. (Show Details)

Resolving, we have moved forward with dumps 2.0