Page MenuHomePhabricator

Data exploration capabilities in Superset and Turnilo for editing data at “editor” level
Closed, DuplicatePublic

Description

  • We have begun surfacing editing data in our internal reporting tools, but they need to be restructured to make them more user-friendly
  • This work will involve video meetings with stakeholders to align on standard terminology in addition to configuring Superset and Turnilo to properly display dimensions and metrics

Event Timeline

kzimmerman renamed this task from Data exploration capabilities in Superset and Turnilo for editing data at the “edits” and “editor” levels to Data exploration capabilities in Superset and Turnilo for editing data at “editor” level.Sep 11 2019, 8:13 PM

From Neil in T224067:

We have now added an edits_hourly cube to Druid (T211173), but while that makes it possible to count edits, it doesn't make it possible to count distinct editors. We can't simply add a user_name column to that cube, because Druid is not well suited to columns with many distinct values.

Instead, we should create a separate cube where each separate row corresponds to the aggregate behavior of a single editor on a single wiki during a single month (essentially, an editor-month dataset, but with a much richer schema than described on that page).

Some initial thoughts:

I currently use an editor-month dataset (neilpquinn.editors_monthly) to calculate active editors for movement metrics. We should make sure this dataset is available in Hive and use it for the movement metrics.
With the cube set up in this fashion, it would not be possible for Turnilo users to calculate the global number of active editors, because the rows will be split by wiki and won't actually identify the users concerned (just as edits_hourly doesn't actually identify the edits concerned). If this is a serious concern, we can add a separate cube where a single row corresponds to a single editor across all wikis during a single month.

LGoto triaged this task as Medium priority.Oct 21 2019, 5:29 PM
LGoto edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

The deployment of the dataset this depends on has been delaying; moving this to Q4.