Page MenuHomePhabricator

Content dashboard for use in Superset around US Election
Closed, ResolvedPublicSpike

Description

As an initial version of the sample content dataset, we want to have a dataset available in Superset to answer potential content related questions around US Election.

  • Last 12 months of data, daily granularity
  • Top 500 pages by page views over the past year (initially top 3 wikis by traffic from US: enwiki, zhwiki, eswiki)
  • Topics as an array (from joining with isaacj.article_topics_outlinks_2020_09 on wiki_db and pageid)
  • Page views (by access_method & agent_type)
  • Log-transformed proportion of total views (easier to store than a tiny decimal)
  • Edit count (by user_is_anonymous & user_is_bot)

We can then write Presto queries to make it explorable datasources and create a dashboard in Superset with following information:

  • top articles daily/monthly with pageviews/edits topics by wikis
  • pageviews and edits for politics and society related topics by wikis

Event Timeline

cchen renamed this task from Content dataset for use in Superset around US Election to Content dashboard for use in Superset around US Election .Nov 5 2020, 7:39 PM
cchen updated the task description. (Show Details)
cchen triaged this task as Medium priority.Nov 10 2020, 6:05 PM
cchen moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

I created a Dashboard for top viewed pages from US with topics. And emailed the stakeholders with the updates.