Page MenuHomePhabricator

Address data quality issues in the mediawiki_history dataset
Open, HighPublic0 Story Points

Description

Parent task that aggregates all the known bugs and open questions around quality for the mediawiki_history dataset.

Event Timeline

Nuria created this task.Sep 20 2018, 2:09 PM
Nuria renamed this task from Data Lake Quality to Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment.Sep 20 2018, 2:13 PM
Nuria updated the task description. (Show Details)
Milimetric triaged this task as High priority.Sep 24 2018, 3:47 PM
Milimetric moved this task from Incoming to Data Quality on the Analytics board.
Milimetric set the point value for this task to 0.
Milimetric moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.
Milimetric removed JAllemandou as the assignee of this task.
Milimetric added a subscriber: JAllemandou.

Should I add other mediawiki_history data quality issues as substasks here? For example, T218463.

We could use a place in Phab to track data quality issues in general, but perhaps this task was created with a very specific scope in mind. Perhaps Analytics-Data-Quality, if it's okay for us to mess with the workboard there?

Nuria added a comment.Mar 20 2019, 9:02 PM

Please add other subtasks here @Neil_P._Quinn_WMF that woudl be helpful

Analytics-Data-Quality tag includes this work but also several other projects that have little do with mediawiki