Page MenuHomePhabricator

Global Editor Metrics - Data Pipeline - edit_per_editor_per_page_daily
Open, Needs TriagePublic

Description

In order to compute per editor pageview metrics, we need a daily updated Data Lake table that allows us to lookup pages edited by a user on or before a date.

See parent T405039: Global Editor Metrics - Data Pipeline description for more detail and options.

This table can be backfilled using mediawiki_history (after T365648) and computed ongoing daily from mediawiki_content_history_v1 (via T406515).

Done is

  • Hive table exists that given a user_central_id and a date, can lookup list of wiki_id,page_id pairs that the user has edited on or before that date.
  • Hive table is updated daily.

Event Timeline

Ottomata updated the task description. (Show Details)
Ottomata added a subscriber: xcollazo.

Change #1196892 had a related patch set uploaded (by Ottomata; author: Ottomata):

[analytics/refinery@master] Add HQL for user_edited_pages_daily

https://gerrit.wikimedia.org/r/1196892

TIL about wmf_contributors.editor_month table, which is a Movement Insights owned Iceberg table that stores monthly edit counts per user (thank you Datahub!).

Should we do also do Iceberg in wmf_contributors for user_edited_pages instead of Hive? Would the data model change? Hm.

Ottomata renamed this task from Global Editor Metrics - Data Pipeline - user_edited_pages to Global Editor Metrics - Data Pipeline - edit_per_user_per_page_daily.Oct 27 2025, 7:48 PM
Ottomata renamed this task from Global Editor Metrics - Data Pipeline - edit_per_user_per_page_daily to Global Editor Metrics - Data Pipeline - edit_per_editor_per_page_daily.Oct 27 2025, 8:27 PM

Change #1196892 merged by Aleksandar Mastilovic:

[analytics/refinery@master] Add HQL for edit_per_editor_per_page_daily and pageview_per_editor_per_page_daily

https://gerrit.wikimedia.org/r/1196892