Page MenuHomePhabricator

Cross-check Wiki Comparison Dataset with Key Product Metrics
Closed, ResolvedPublic

Description

Compare Wiki Comparison Dataset with Key Product Metrics to:

  • double-check that the numbers are accurate
  • explain differences in definitions, when metrics sound similar

Check data for:

  • 2020
  • 2019
  • 2018

Check:

  • unique devices
  • pageviews
  • editors
  • edits

Event Timeline

Summary of Metric Differences

Unique devices
First, product metrics only report unique devices of one project family: wikipedia. Wiki comparison report 16 project families: wikipedia, commons, incubator, foundation, mediawiki, meta, sources, species,wikibooks, wikidata, wikinews,, wikiquote, wikisource, wikiversity, wikivoyage, and wiktionary.

Looking at wikipedia family alone, the number of unique devices in wiki comparison is 25% higher than that in product metrics.

2020 monthly avgProduct metricsWiki comparisonWiki comparison /Product metric
Unique devices(all Wikipedias)1.65B2.06B1.25X

It’s because two calculations use two different source tables, which aggregate data differently. Product metrics use wmf.unique_devices_per_project_family_monthly schema. It is aggregated based on the last global access cookie (WMF-Last-Access-global). Wiki comparison uses wmf.unique_devices_per_domain_monthly schema. It is aggregated based on last access cookie(WMF-Last-Access) on one project.

Therefore, adding unique devices of each project together resulting in a 25% higher total than product metric.

Pageviews
In product metrics, pageviews include two agent types: user and automated. In wiki comparison, pageviews only include one agent type: user. Since May 2020, a portion of the existing user group has been identified and tagged as automated, estimated to be around 5% of the original user group. As a result, the number of total pageviews in wiki comparison is 5% lower than that in product metrics.

2020 monthly avgProduct metricsWiki comparisonWiki comparison /Product metric
Pageviews19.2B18.3B0.95X

Editors
The total active editors in wiki comparison is 1.26x of active editors in monthly product metrics. It’s because the metric definitions and how we count global editors are different in product metrics and wiki comparison. In monthly product metrics, the number of active editors is the number of registered users who made at least 5 content edits across all projects. In wiki comparison, the number of active editors is the number of registered users who made at least 5 content edits in that project. On one project, active editors in wiki comparison count in less editors as it has a higher threshold ( 5 content edits in one project vs in all projects). But when adding the active editors of each project together into the total in wiki comparison, global editors are counted multiple times. In product metrics, global editors are counted only once no matter how many wikis they contributed to. It results in 26% more active editors in wiki comparison if we sum all wikis together. Same reason also applies to new active editors.

2020 monthly avgProduct metricsWiki comparisonWiki comparison /Product metric
Active editors923781162021.26X
New active editors19375208211.07X