Deliverable
Create content datasets to explore in Superset around following use cases:
- Top topics on wikipedia by pageviews
- Top topics on wikipedia by edits categorized by editor type (anonymous editor vs registered editor )
Acceptance Criteria
- datasets can be manually updated (automating is out of scope)
- pageviews and editors datasets are updated monthly, through the end of Q4
- Pageviews: Pageviews, project, country, topics, date
- Edits: Edits, project, bot/non-bot, editor type (anonymous editor vs registered editor), topics, date
- datasets are QAed (code review, reasonableness checks)
- datasets can be explored in Superset
- demo of results/data in Product Leads meeting
- data dictionary entries for staging tables, with links to relevant code
Deliveries:
- Slide deck presented in Product Leads meeting: link
Pageview data:
- Dashboard in Superset: Pageview_Topics_Dashboard
- Documentation for content pageview dataset: link
- Notebook to update content_pv table
Edit data: