Page MenuHomePhabricator

Update mediawiki-history subgraph-partitioner so that it uses [page/user]_id in addition to title/text
Open, NormalPublic

Description

We have page_id in log events since ~2014. subgraph-partitioning helps rebuilding history when page_id is not present, but can also break lineage in case of data-inconsistency (events with the same page_id end up in different subgraph because a move-event has been lost). There should be a way to solve this, either through hierarchical-graph-partitioning, or by using a 2 steps job (separate events with a page_id from those without and apply subgraph only on those without).
Note: This task is created after having been mentioned in T213603.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 12 2019, 5:14 PM
Milimetric moved this task from Incoming to Data Quality on the Analytics board.Mar 18 2019, 3:45 PM
Milimetric triaged this task as Normal priority.
JAllemandou renamed this task from Update mediawiki-history subgraph-partitioner so that it uses page_id for pages to Update mediawiki-history subgraph-partitioner so that it uses [page/user]_id in addition to title/text.Mar 19 2019, 8:20 PM