Page MenuHomePhabricator

Create ETL pipelines for Automoderator baseline metrics
Closed, ResolvedPublic

Description

Several baselines metrics related to Automoderator can substantially vary based on the time of the query (for example, number of pending changes to review can be 100 or 1000). Data processing pipelines (Airflow or cron) to be created for the following:

  • Vandalism pageviews (frequency: weekly/daily)
  • Flagged revisions backlog (frequency: hourly)
  • Recent changes patrolling activity (frequency: daily)
    • unpatrolled
    • patrolled

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
DAGs to calculate metrics related to recentchanges patrolling (analytics_product)repos/data-engineering/airflow-dags!825kcvelagarecentchanges_daily_dagsmain
Initial Product Analytics moderation job (frevs)repos/data-engineering/airflow-dags!770kcvelagamoderation-job-artifact-1main
Moderation queriesrepos/product-analytics/data-pipelines!15kcvelagamoderation_queriesmain
Customize query in GitLab

Event Timeline

KCVelaga_WMF changed the task status from Open to In Progress.Jun 25 2024, 5:05 AM
KCVelaga_WMF updated the task description. (Show Details)
KCVelaga_WMF changed the status of subtask T367016: ETL pipeline for unpatrolled recentchanges daily activity from Open to In Progress.

Change #1054903 had a related patch set uploaded (by Bearloga; author: KCVelaga):

[analytics/wmf-product/jobs@master] Add WMF data pipelines (git submodule) & scripts for regular runs

https://gerrit.wikimedia.org/r/1054903

Change #1054903 merged by Bearloga:

[analytics/wmf-product/jobs@master] Add WMF data pipelines (git submodule) & scripts for regular runs

https://gerrit.wikimedia.org/r/1054903

Mentioned in SAL (#wikimedia-analytics) [2024-08-14T17:18:10Z] <ottomata> scap deploy airflow analytics_product for vandalism_pageviews_dag - T362612

jebe merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/825

DAGs to calculate metrics related to recentchanges patrolling (analytics_product)

KCVelaga_WMF closed this task as Resolved.EditedSep 13 2024, 7:06 AM

All the pipelines are successfully running!

Primary job repo: moderation-mariadb-jobs

flagged revisions pending hourly

Calculates pending flagged revisions to be reviewed, by hour and wiki.

unpatrolled recentchanges daily

Calculates pending unpatrolled recent changes daily, by wiki.

patrolled recentchanges daily

Calculates patrolled recent changes daily, by wiki.

vandalism pageviews monthly

Calculates pageviews of vandalized pages monthly, by wiki.