Page MenuHomePhabricator

ETL pipeline for flaggedrevs metrics (pending frevs hourly)
Closed, ResolvedPublic

Description

  • Revisions to be reviewed
  • Time elapsed since revision
  • Median time to be reviewed
  • Unique reviewers
  • Average # Reviews per reviewer

Initial baseline analysis for reference: https://gitlab.wikimedia.org/kcvelaga/automoderator-measurement/-/blob/main/baselines/T348863_content_moderation_backlogs_flagged_revs.ipynb.ipynb?ref_type=heads

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Queries and notebooks for moderation metrics (related to Automoderator)repos/product-analytics/data-pipelines!13kcvelagaautomoderator_puppetmain
Draft: dummy: review fr_pending scriptkcvelaga/pyspark-conda-mariadb!1kcvelagadummy_mrmain
fr_pending_hourlyrepos/product-analytics/data-pipelines!12kcvelagafr_pendingmain
fr_pending_hourlyrepos/product-analytics/data-pipelines!11kcvelagafr_pendingmain
Customize query in GitLab

Event Timeline

KCVelaga_WMF triaged this task as Medium priority.
KCVelaga_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

Adding a note for future reference, that this pipeline requires looping through a list of wikis, some example DAGs for reference:

KCVelaga_WMF renamed this task from ETL pipeline for flaggedrevs metrics to ETL pipeline for flaggedrevs metrics (pending frevs hourly).Jun 10 2024, 5:23 AM
KCVelaga_WMF changed the task status from Open to In Progress.

This DAG is finally working! Code related to the pipeline

Data is available at: wmf_product.moderation_flagged_revisions_pending_hourly

I will check the data in a week again, and resolve if there are no issues.

Mentioned in SAL (#wikimedia-operations) [2025-03-13T14:00:43Z] <kcvelaga@deploy2002> Started deploy [airflow-dags/analytics_product@554407c]: T362615

Mentioned in SAL (#wikimedia-operations) [2025-03-13T14:01:18Z] <kcvelaga@deploy2002> Finished deploy [airflow-dags/analytics_product@554407c]: T362615 (duration: 01m 39s)