Page MenuHomePhabricator

Address data quality issues in the mediawiki_history dataset
Open, HighPublic0 Story Points

Description

Parent task that aggregates all the known bugs and open questions around quality for the mediawiki_history dataset.

Event Timeline

Nuria created this task.Sep 20 2018, 2:09 PM
Nuria renamed this task from Data Lake Quality to Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment.Sep 20 2018, 2:13 PM
Nuria updated the task description. (Show Details)
Milimetric moved this task from Incoming to Data Quality on the Analytics board.Sep 24 2018, 3:47 PM
Milimetric triaged this task as High priority.
Milimetric set the point value for this task to 0.
Milimetric moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.
Milimetric removed JAllemandou as the assignee of this task.
Milimetric added a subscriber: JAllemandou.

Should I add other mediawiki_history data quality issues as substasks here? For example, T218463.

We could use a place in Phab to track data quality issues in general, but perhaps this task was created with a very specific scope in mind. Perhaps Analytics-Data-Quality, if it's okay for us to mess with the workboard there?

Nuria added a comment.Mar 20 2019, 9:02 PM

Please add other subtasks here @Neil_P._Quinn_WMF that woudl be helpful

Analytics-Data-Quality tag includes this work but also several other projects that have little do with mediawiki