- We have begun surfacing editing data in our internal reporting tools, but they need to be restructured to make them more user-friendly
- This work will involve video meetings with stakeholders to align on standard terminology in addition to configuring Superset and Turnilo to properly display dimensions and metrics
|Open||None||T298924 Superset - Product Analytics Canonical Dashboards, Reports, and Datasets|
|Duplicate||cchen||T230092 Data exploration capabilities in Superset and Turnilo for editing data at “editor” level|
|Resolved||cchen||T245049 Draft for Editor schemas|
|Resolved||cchen||T262954 Add editors metrics to Superset key metrics dashboards|
|Declined||None||T256719 Add editors_monthly data to Druid|
|Open||Mayakp.wiki||T256050 Add dimensions to editors_daily dataset|
From Neil in T224067:
We have now added an edits_hourly cube to Druid (T211173), but while that makes it possible to count edits, it doesn't make it possible to count distinct editors. We can't simply add a user_name column to that cube, because Druid is not well suited to columns with many distinct values.
Instead, we should create a separate cube where each separate row corresponds to the aggregate behavior of a single editor on a single wiki during a single month (essentially, an editor-month dataset, but with a much richer schema than described on that page).
Some initial thoughts:
I currently use an editor-month dataset (neilpquinn.editors_monthly) to calculate active editors for movement metrics. We should make sure this dataset is available in Hive and use it for the movement metrics.
With the cube set up in this fashion, it would not be possible for Turnilo users to calculate the global number of active editors, because the rows will be split by wiki and won't actually identify the users concerned (just as edits_hourly doesn't actually identify the edits concerned). If this is a serious concern, we can add a separate cube where a single row corresponds to a single editor across all wikis during a single month.