We have page_id in log events since ~2014. subgraph-partitioning helps rebuilding history when page_id is not present, but can also break lineage in case of data-inconsistency (events with the same page_id end up in different subgraph because a move-event has been lost). There should be a way to solve this, either through hierarchical-graph-partitioning, or by using a 2 steps job (separate events with a page_id from those without and apply subgraph only on those without).
Note: This task is created after having been mentioned in T213603.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T221828 Mediawiki-history release - Backlog | |||
Open | None | T218130 Update mediawiki-history subgraph-partitioner so that it uses [page/user]_id in addition to title/text |