Page MenuHomePhabricator

"Content" equivalent of pageviews daily or edits_hourly available to use in Turnilo and Superset
Open, MediumPublic

Description

Context

The product department has developed a framework for thinking about how value is created through our ecosystem.

  • Content drives readership (content and pageviews ✔️)
  • Readership inspires/drives editors (editors and editor lifecycle))
  • Editors create content (edits ✔️ and content)
  • Content drives readership.... etc.

We are working on understanding the components of this "flywheel" or virtuous cycle better and better data is a key to unlocking this. The two remaining pillars of the model are editors and content. This ticket is about content.

The opportunity

There is an important class of questions that we cannot answer now that impact our ability to make product decisions:

What kind of content drives readership?

A few samples of the questions that come from this:

  • How many pages drive 90% of traffic?
  • What topics do our users care most about. We often go by number of pages affected by a change rather than the # of pageviews. This tends to favor things like beetle species or proteins over television shows.
  • What is the ratio between quality, effort and consumption? Are there areas where there is a missed opportunity (shoutout to @nettrom_WMF's work)
  • What is the average # of watchers for highly trafficked pages and has that changed over time?

Draft of what this cube could look like

Values:

  • Pageviews
  • (Edits?!)

Page properties

  • topic!
  • Namespace (or just main/not main?)
  • Project
  • Project family
  • Language
  • Topic
  • Some simple quality score we never show (ORES or other)
  • Age
  • num of editors
  • num of edits
  • length/size
  • num of pictures
  • Time since last edit
  • Revert rate
  • num of watchers
  • Templates (I assume this would be computationally hazardous)

Related
Similar to:
Edit" equivalent of pageviews daily available to use in Turnilo and Superset
https://phabricator.wikimedia.org/T211173

Related tasks:
descriptive metric: traffic distribution
https://phabricator.wikimedia.org/T190174

[REQUEST] En Wiki pageviews by topic. Rough cut.
https://phabricator.wikimedia.org/T221891

Relationship between content and traffic by wiki
https://phabricator.wikimedia.org/T190113

Related Objects

StatusSubtypeAssignedTask
Openkzimmerman
DuplicateMayakp.wiki
Resolvedcchen
Opencchen
Opencchen
Resolvedcchen
Resolvedcchen
Resolvedcchen
DuplicateSpikempopov
ResolvedSpikecchen
Resolvedcchen
Opencchen
OpenSpikecchen
Openkzimmerman
Opencchen
Openjwang
Opencchen
ResolvedMMiller_WMF
Resolvedcchen
DeclinedNone
OpenNone
Openkzimmerman

Event Timeline

LGoto triaged this task as Medium priority.Oct 7 2019, 4:47 PM
LGoto moved this task from Triage to Backlog on the Product-Analytics board.

@JKatzWMF moving this to the backlog; we'll reference it when we start modeling data for content in Q3

These were interesting and helpful metrics to review for GLOW India articles:
Namespace (or just main/not main?)
Project
age
num of editors
num of edits
length/size
num of watchers
time since last edit
links

I would second these for inclusion in a Superset cube.