Page MenuHomePhabricator

ETL pipeline for Pageviews Received to Potentially Vandalised Content
Open, HighPublic

Description

For baseline measurement, an ETL pipeline is required to calculate "Pageviews Received to Potentially Vandalised Content" every week.

The suggested destination table schema is

  • month
  • wiki
  • platform (desktop/mobile)
  • number of potentially vandalized revisions
  • total pageviews to articles
  • total pageviews to potentially vandalized revisions

Other notes:

  • The pipeline should run monthly, and is dependent on wmf.mediawiki_history
  • The operational definition for potential vandalism is defined in the baseline analysis notebook.
  • The target table destination is wmf_product.am_potential_vandal_pageviews_monthly (has to be created)

Details

Other Assignee
KCVelaga_WMF

Event Timeline

KCVelaga_WMF triaged this task as High priority.
KCVelaga_WMF created this task.
KCVelaga_WMF updated Other Assignee, added: KCVelaga_WMF.
KCVelaga_WMF moved this task from Triage to Tracking on the Product-Analytics board.