The product department has developed a framework for thinking about how value is created through our ecosystem.
- Content drives readership (content and pageviews ✔️)
- Readership inspires/drives editors (editors and editor lifecycle))
- Editors create content (edits ✔️ and content)
- Content drives readership.... etc.
We are working on understanding the components of this "flywheel" or virtuous cycle better and better data is a key to unlocking this. The two remaining pillars of the model are editors and content. This ticket is about content.
There is an important class of questions that we cannot answer now that impact our ability to make product decisions:
What kind of content drives readership?
A few samples of the questions that come from this:
- How many pages drive 90% of traffic?
- What topics do our users care most about. We often go by number of pages affected by a change rather than the # of pageviews. This tends to favor things like beetle species or proteins over television shows.
- What is the ratio between quality, effort and consumption? Are there areas where there is a missed opportunity (shoutout to @nettrom_WMF's work)
- What is the average # of watchers for highly trafficked pages and has that changed over time?
Draft of what this cube could look like
- Namespace (or just main/not main?)
- Project family
- Some simple quality score we never show (ORES or other)
- num of editors
- num of edits
- num of pictures
- Time since last edit
- Revert rate
- num of watchers
- Templates (I assume this would be computationally hazardous)
Edit" equivalent of pageviews daily available to use in Turnilo and Superset
descriptive metric: traffic distribution
[REQUEST] En Wiki pageviews by topic. Rough cut.
Relationship between content and traffic by wiki