Page MenuHomePhabricator

Data quality issues in the mediawiki_history data stream
Closed, ResolvedPublic0 Estimated Story Points

Description

Parent task that aggregates all the known bugs and open questions around quality for the mediawiki_history data stream.

Event Timeline

Nuria renamed this task from Data Lake Quality to Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment.Sep 20 2018, 2:13 PM
Nuria updated the task description. (Show Details)
Milimetric moved this task from Incoming to Data Quality on the Analytics board.
Milimetric set the point value for this task to 0.
Milimetric moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.
Milimetric removed JAllemandou as the assignee of this task.
Milimetric added a subscriber: JAllemandou.

Should I add other mediawiki_history data quality issues as substasks here? For example, T218463.

We could use a place in Phab to track data quality issues in general, but perhaps this task was created with a very specific scope in mind. Perhaps Analytics-Data-Quality, if it's okay for us to mess with the workboard there?

Please add other subtasks here @Neil_P._Quinn_WMF that woudl be helpful

Analytics-Data-Quality tag includes this work but also several other projects that have little do with mediawiki

nshahquinn-wmf renamed this task from Address data quality issues in the mediawiki_history dataset to Data quality issues in the mediawiki_history data stream.Oct 30 2020, 10:30 AM
nshahquinn-wmf lowered the priority of this task from High to Medium.
nshahquinn-wmf updated the task description. (Show Details)